Ali Hakimzadeh
posted 28 days ago
As a Junior Researcher with a strong focus on pipeline development in the field of Metabarcoding, I am passionate about leveraging cutting-edge technologies to advance our understanding of biological systems. With a primary emphasis on Nextflow module development, I have successfully implemented various tools and modules into the PipeCraft 2 pipeline, enhancing its capabilities and efficiency. My expertise extends to developing bioinformatic pipelines for long-read sequencing machines, particularly PacBio. Currently, I am working on my thesis, which delves into the intricacies of creating tailored bioinformatic solutions for processing and analyzing data from advanced sequencing platforms. I believe that this research will contribute significantly to the field and open up new avenues for exploration. As a dedicated professional, I am always eager to expand my knowledge and engage with fellow researchers and industry professionals to exchange ideas and collaborate on projects that drive innovation in Metabarcoding and bioinformatics. Feel free to connect with me if you would like to discuss potential collaboration opportunities, share insights, or explore cutting-edge developments in our field. Together, we can push the boundaries of science and make meaningful contributions to the ever-evolving world of genomic research.

Decoding Metabarcoding: A Look at Data Processing Software Suites and Pipelines (part1)

Bioinformatics Metabarcoding Pipeline

In the ever-evolving world of bioinformatics, researchers continually seek new tools to process and analyze metabarcoding data. Two primary categories of software have emerged to cater to these needs: suites hosting multiple algorithms and pre-defined pipelines. This series of blog posts delves into the strengths and limitations of each category and their impact on the field.

Software suites and pre-compiled pipelines Metabarcoding software suites such as USEARCH, VSEARCH, DADA2, OBITools, mothur, and QIIME 2 offer a diverse range of algorithms for sequence data analysis. These suites enable users to create custom pipelines tailored to their specific needs by combining commands and settings. VSEARCH stands out as a cost-effective alternative, providing a wide range of functionalities without the need for an expensive license. Furthermore, mothur and QIIME 2 integrate unique processing algorithms while also incorporating features from VSEARCH and/or DADA2.

In contrast, pre-defined analytical pipelines streamline the metabarcoding data analysis process, catering to users with limited bioinformatics skills. These pipelines consist of workflow steps that have been validated for specific sequencing data, and while some include newly designed algorithms, others combine open-source tools into a cohesive and user-friendly process. Despite their pre-defined nature, many of these pipelines still allow for customization, enabling users to adapt settings based on the characteristics of their sequencing data sets.

Basic structure of a metabarcoding pipeline Demultiplexing, an essential step in the analysis of metabarcoding data, involves distributing sequences into individual files corresponding to specific samples. While many sequencing providers offer demultiplexed sequences, in cases where this step is not integrated, researchers can employ software such as cutadapt, sdm, or lima. Additionally, sequencing adapters, indexes, and primers should be removed before proceeding to the subsequent analyses.

The subsequent stages of a standard DNA metabarcoding pipeline include sequence filtration based on read quality scores, removal of putative chimeric/artifactual sequences, defining features (e.g., ASVs, OTUs), and taxonomic annotation of the features. Different strategies may be employed depending on the characteristics of the sequencing data or the study objectives. Notably, quality threshold calculation methods, such as filtering based on the expected number of errors, are preferred over the average quality score threshold, as they minimize the risk of false positive features.

Chimeric sequences can be removed using de novo methods or, when an appropriate reference database is available, reference-based chimera filtering. Some pipelines, like NextITS and FROGS, have implemented approaches to recover false-positive chimeras and preserve real community members.

The formation of features, such as ASVs and OTUs, occurs through various algorithms across different software. ASVs represent denoised reads with minimal differences between variants, while OTUs are generally formed based on global sequence similarities. The choice between ASVs and OTUs depends on the specific context and research goals.

After feature formation, low-abundant features are often discarded, as many are considered artifacts. However, post-clustering processes like LULU can help retain rare, potentially real features. Pipelines such as NextITS, LotuS2, and Dadaist2 implement the UNCROSS2 algorithm to address tag-switching errors, which can artificially inflate richness.

Taxonomy assignment is a crucial step in metabarcoding pipelines, with alignment-based (e.g., BLAST) and sequence composition-based approaches (e.g., RDP Naïve Bayesian classifier) being the most common methods. The choice of assignment method and reference database significantly impacts classification accuracy, necessitating a trade-off between detecting true-positives and false-positives.

In conclusion, DNA metabarcoding pipelines are complex and multifaceted, requiring careful consideration of various strategies, software, and databases. As a bioinformatics enthusiast, understanding these intricacies will enable you to harness the power of this technology and make well-informed decisions in your research endeavors.


Hello Ali, Thank you for this beautiful work you have shared. In my PhD, I used metabarcoding for gut microbiota sequencing in Daphnia (a water flea) and also RNASeq for genome-wide transcription analysis to understand the mechanism of toxicity of persistent pollutants or non-target species in a freshwater environment. I would be very grateful if could have a chat, especially on the downstream analysis of my microbiome data. Almost there, but I am stuck on functional analysis. Thank you

Ali Hakimzadeh7 days ago

Hi Muhammad, Thanks for your kind words. sure just drop me a message on discord and i will be glad if i can assist you in any way :)

Login or Signup to leave a comment