www.bioscience.co.uk
Sales & Support: +44 (0)1223 316 855

Setting the standard for microbiomic studies

Published 20 December 2023 by Oscar Gammon

NGS advancements and the development of convenient consumer microbiome testing has brought with it a boom in microbiomics and metagenomics studies, transforming our understanding of the impact microbiota have on health and our environment. Studies digesting microbial organisation, function and interactivity often excrete problems with reproducibility and the reliability of data – attributed to technical differences in protocols at each step from sample collection to sequencing. This explosion in studies proves futile without consistent workflows and comparable controls, as microbial profiles can be twisted and manipulated throughout their processing.

The constipation of reliable and accurate data stems from the fact that bias can arise at each and every stage of the workflow and is so widespread that submitting the same sample to differing profiling organisations can yield drastically different results. In this article we will look at some potential sources of bias, the challenges that arise and how to overcome bias in microbiomic and metagenomic studies.

Where can bias arise in microbiomics studies?

Sample collection

Microbial misrepresentation and the prevention of high-quality data can begin as soon as the samples are collected. Different methods and sources of samples present distinct microbial profiles, which are subject to change dependant on the time and temperature of preservation.

Frozen storage is the gold standard, but even freeze-thaw cycling can diminish sensitive microbes such as the gram-negative Bacteroidetes, flushing away whole populations.1 Pathogenic infiltration of samples and nuclease degradation of native nucleic acids also distort profiles leading to the introduction of preservatives - the specific use of which further instigates bias amongst different collection and preservation techniques.

DNA extraction

Perhaps the most notorious stage for the introduction of bias is DNA extraction.

Microbial communities are complex and diverse. Failing to account for this leads to non-uniform lysis, skewing the profile in favour of those microbes more susceptible. Workflows employing thermal or chemical lysis techniques implement bias towards more vulnerable gram-negative bacteria, belittling their resistant counterparts.2 Mechanical extraction protocols are considered the gold standard for unbiased DNA extraction but even still demonstrate extraction bias between species of the same genera.3

As-well as concealing present microbial populations, extraction may incorporate contaminant imposter DNA. Microbes dwelling in equipment and reagents – coined the “kitome” - account for a small amount of biomass and bias, only exacerbated by smaller samples.2 Furthermore, contaminant DNA sourced from dead cells and biofilms further over-represent microbial populations resulting in bias.  

Library preparation and target gene sequencing

Sample contamination can inhibit PCR, making microbes in low abundance even harder to detect. Attempts to mediate reduced amplification by increasing the number of cycles only limit accuracy by generating chimeras. It is often difficult to quantify the amount of “on-target” template DNA added to PCRs with many protocols aiming for strong amplification regardless of template input discrepancies, co-amplifying “off-target” DNA for sequencing.

Illumina® sequencing platforms dominate the field but akin to all sequencing technologies present a unique bias profile. The high-throughput short-read nature encourages the generation of a range of primers set to target different variable regions of marker gene. Primers of the typical 16s rRNA marker predominantly target the V4 region - none of which amplify all species equally, demonstrating bias toward and against the Streptococcus and Propionibacterium taxonomic groups respectively.4

Metagenomic shotgun library preparation and sequencing

Similarly, shotgun metagenomic sequencing uses Illumina® technology but unlike target-gene analysis lack a universal method of library preparation. Steps often include but are not limited to DNA fragmentation, optional repairing of DNA fragments, ligation of platform specific adaptors and PCR library amplification. Distinct preparation kits and strategies are subject to their own individual biases – observed in kits utilising sonication-based mechanical fragmentation prompting bias toward high GC-content regions.5

Marker gene bioinformatics

The preceding infection of bias can be inflated through bioinformatic analysis, with the preliminary removal of poor-quality data favouring low-abundance microbes. Coupled with variation in the expression of the 16S rRNA marker gene amongst species and taxonomies, only further distort the portrait of communal composition.

Base errors arising from PCR and sequencing negate direct analysis, so sequences must be assigned into analytical units using either error-correcting denoising algorithms or operational-taxonomic unit (OTU) picking, each presenting exclusive systemic bias. Sub-methodologies of OTU definition may encourage minor bias, but regardless of clustering strategy the use of OTUs has been demonstrated to inflate the richness and diversity of microbial populations.

Different classifiers and strategies used to assign a taxonomy instigate their own systemic biases when trying to resolve rare taxonomic groups and are each dependant on individual reference databases, further introducing inter-procedural bias.

Shotgun metagenomic bioinformatics

Shotgun sequencing of all sample DNA diverts analysis into two main approaches: reference-based metagenomics and de-novo metagenomic assembly. Reference-based analysis compares sequenced DNA to databases of either reference genes, markers genes or translated protein sequences. The choice of comparative tool can cause so much bias that the number of identified microbes in the same sample can differ by as much as three orders of magnitude.6

De-novo metagenomic analysis removes the need for reference databases instead exploiting overlaps in sequences to generate predicted sequences which can be grouped into bins co-present on a genome. Assembly tools often struggle to illustrate the relationships between large portions of reads leading to the under-classification of functional, repetitive regions e.g., rRNA genes, eliciting significant bias in the detection of microbes of contrasting abundance.

How to overcome bias in microbiomic studies?

The infection of bias is only aggravated by deviations in workflow protocol, negating reliability and accuracy of data producing microbial profiles that couldn’t be further from the truth.

The only reliable means of identifying bias is to run the sample workflow along with a microbial standard control. A microbial standard refers to a group of various microorganisms working to mimic microbiome populations within samples, except the abundance of each microorganism is known. Running these standards in the regular workflow as controls allows bias to interpreted dependant on how the produced data of the standard differs from the known composition. Microbial standards help to identify bias but don’t aid prevention, and so there is a critical need for standardisation across the workflow from collection to completion.

Zymo Research have developed a range of microbiomics tools to enable stage specific workflow standardisation and removal of bias from sample collection through to data interpretation. The ZymoBIOMICS microbial community standards include isolated DNA standards to tackle bias in library preparation and sequencing bias, as well as whole cell standards to assess the complete workflow including stress testing your DNA extraction methods. For unbiased sample collection, they offer DNA/RNA shield preservation reagent to stablise nucleic acids from any biological sample as soon as samples are collected. Also available is the ZymoBIOMICS extraction kit range, for the unbiased single or co- isolation of DNA and RNA from microbiome samples.

Learn more about how we can help standardise your microbiomics workflow with our microbiomics and metagenomics range.

Illumina is a registered trademark of Illumina, Inc

References and additional reading

  1. Bahl, M.I., Bergström, A. and Licht, T.R., 2012. Freezing fecal samples prior to DNA extraction affects the Firmicutes to Bacteroidetes ratio determined by downstream quantitative PCR analysis. FEMS microbiology letters, 329(2), pp.193-197.
  2. Nearing, J.T., Comeau, A.M. and Langille, M.G., 2021. Identifying biases and their potential solutions in human microbiome studies. Microbiome, 9(1), pp.1-22.
  3. McLaren, M.R., Willis, A.D. and Callahan, B.J., 2019. Consistent and correctable bias in metagenomic sequencing experiments. Elife, 8, p.e46923.
  4. Comeau, A.M., Douglas, G.M. and Langille, M.G., 2017. Microbiome helper: a custom and streamlined workflow for microbiome research. MSystems, 2(1), pp.10-1128.
  5. Poptsova, M.S., Il'Icheva, I.A., Nechipurenko, D.Y., Panchenko, L.A., Khodikov, M.V., Oparina, N.Y., Polozov, R.V., Nechipurenko, Y.D. and Grokhovsky, S.L., 2014. Non-random DNA fragmentation in next-generation sequencing. Scientific reports, 4(1), p.4532.
  6. McIntyre, A.B., Ounit, R., Afshinnekoo, E., Prill, R.J., Hénaff, E., Alexander, N., Minot, S.S., Danko, D., Foox, J., Ahsanuddin, S. and Tighe, S., 2017. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome biology, 18(1), pp.1-19.
Microbiomics