Long-read sequencing: NGS "method of the year"

Published 1 November 2023

Long-read sequencing has made a comeback

Long-read sequencing technologies provide analysis of large and intricate regions of the genome, and recent significant technological advancements in long-read sequencing instruments seem to have reinvigorated interest in this method. This is evident by the fact a 2023 Nature Methods article hailed it as "method of the year".

Although this technique is well-suited for various clinical applications, such as molecular diagnosis and precision medicine, it seems to be making headway in planetary health initiatives, addressing challenges like climate change. For instance, HiFi technology, is proving to be invaluable in Agricultural Biotechnology for conservation, crop enhancement, disease control, and food safety.

So let us provide you with an overview of the current state of long-read sequencing and offer insights into the future of these powerful technologies and the crucial role of NGS library preparation.

Benefits of long-read sequencing

Data analysis can be likened to assembling a jigsaw puzzle, with the final picture often unclear from the outset. Short-read sequencing creates millions of tiny puzzle pieces, making it challenging to fit them together. Typically, short-read platforms produce reads that are only a few hundred bases long. In contrast, long-read sequencing yields fewer but much larger pieces, simplifying genome assembly and resolving complex, hard-to-map regions. With this method, raw reads can stretch into hundreds of thousands of bases.

Long reads offer significant advantages in identifying complex structural variations, such as large insertions/deletions, inversions, repeats, duplications, and translocations. This technology can also phase Single Nucleotide Polymorphisms (SNPs) into haplotypes, analyse complete plasmids, construct scaffolds for de novo assembly, and resolve splicing events within full-length cDNA. Additionally, long-read sequencing can often detect other information like DNA methylation.

Potential disadvantages of long-read sequencing

Despite their advantages, long-read technologies come with some downsides when compared to short-read methods. Long-read instruments generally yield lower data throughput, making the cost per base higher (although this is gradually decreasing as technology advances). Moreover, they tend to have a higher raw error rate, but this can often be mitigated through higher coverage or consensus sequencing methods.

Additionally, many commonly used Next-Generation Sequencing (NGS) methods and tools are optimised for short-read data, making long-read data analysis more specialised and, in many cases, requiring custom bioinformatics support. Furthermore, library preparation methods for long reads are typically more labour-intensive and challenging to scale and automate compared to short-read methods such as seqWell’s ExpressPlex kit, the fastest high-throughput library preparation kit available with a 30-minute hands-on-time.

Current state of long-read technologies

Oxford Nanopore Technologies (ONT) and PacBio are the dominant players in the long-read sequencing field, each relying on different underlying technologies. ONT tools sequence long DNA molecules deposited on a patterned flow cell with protein pores, utilising changes in conductance as DNA molecules pass through the pores under an applied voltage field. On the other hand, PacBio's SMRT technology relies on sequencing single long molecules captured in tiny openings called zero-mode waveguides (ZMWs).

These ZMWs produce a relatively noisy signal of individually read nucleotides, which is then improved through circular consensus sequencing (HiFi reads) by reading each molecule's sequence multiple times. This approach significantly reduces the error rate compared to the raw signal accuracy.

Critical role of NGS library prep

While ONT and PacBio have made progress in enhancing their sequencing technologies, there has been limited attention given to scaling up long-read library preparation to meet the growing demand. Many long-read methods do not involve PCR amplification, which allows sequencing of native DNA, avoids introducing PCR biases, and provides extra information like DNA methylation. However, not using amplification necessitates a large input of DNA into library preparation, often up to 5 micrograms, depending on the method.

Standard long-read library preparation methods typically involve mechanical fragmentation of DNA, using low-throughput, time-consuming approaches that are challenging to automate. As sequencer yields increase, the need for higher-scale multiplexing solutions also grows. Most current methods use ligation-based approaches to add barcoded adapters, which are multistep, expensive, and time-consuming processes.

For example, long-read targeted capture on the PacBio Revio can accommodate numerous samples per SMRTcell. However, the current standard method for making these libraries involves syringe-based fragmentation, which is time-consuming and can be error-prone due to individual sample processing. This is followed by ligation-based library preparation, adding further time to the process.

Consequently, processing large-scale targeted capture projects using these methods becomes a daunting and costly task.

The future of long-read library prep

To match the throughput and scalability of new long-read technologies, our trusted partner and DNA multiplexing experts, seqWell, are developing library preparation products that leverage transposase-based technology. Transposase chemistry offers unique advantages over conventional workflows involving mechanical shearing and ligation-based chemistry. Transposase-based tools can consolidate the steps of fragmenting and adapting DNA molecules with sample barcodes and sequencing adapters into a single molecular step. This simplifies the library preparation process, improves overall efficiency, and allows for the simultaneous processing of multiple samples.

seqWell are also working on innovative methods that utilise optimised Tn5 transposase tagmentation to generate high-quality multiplexed libraries. These innovations are based on their existing purePlex™ dual index technology. The library preparation chemistry, currently named purePlex HC, offers a streamlined workflow for pooling samples into plexes for hybrid capture immediately following transposase-mediated tagging. This not only saves time and costs but also produces libraries with high molecular complexity.

Original content written by seqWell. Read the original article.

NGS

return to blog