BioMicroCenter:Sequencing

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(MIT Core Collaboration)
(ILLUMINA MASSIVELY PARALLEL SEQUENCING)
Line 16: Line 16:
</biblio>
</biblio>
 +
 +
==Determining ideal read length and depth of coverage==
 +
The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats.  The Illumina GAIIx is capable of single-end and paired-end sequencing from +36bp to a maximum of +150bp, while the HiSeq 2000 is capable of single-end and paired-end sequencing from +36bp to +108bp.
 +
 +
As a flowcell is being run, reads undergo internal quality control, filtering out unreliable reads.  The GAIIx averages ~25 million acceptable reads (clusters) per lane, while the HiSeq2000 averages ~50 million.  Determining the ideal parameters for a sequencing run requires knowledge of the genome being sequenced.
 +
 +
For example, a 150Mbp genome needs to be sequenced at 5X coverage, requiring 750Mbp of data output (150*5). This can be confidently sequenced using a standard +36bp single-end lane on the Illumina GAIIx, which produces ~900Mbp of data on average (36*25).  If a larger genome is being sequenced, for example, one that is 300Mbp, 1.5Gbp is the target data output (300*5), so a standard +36bp single-end lane may not be sufficient, but a +72bp single-end (25*72=1.8Gbp), or a +40bp paired end run (40*50=2Gbp) would be fine.
 +
 +
Multiplexing is useful for applications requiring a lower data output per sample.  Sequencing Saccharomyces, which has a ~12.5Mbp genome, at 5X coverage requires 62.5Mbp of data.  Multiplexing 10 samples on one lane in a +36bp single read flowcell would require 625Mbp of output to achieve the desired coverage.  As stated above, the average output for one lane is ~900Mbp of data, so multiplexing the 10 samples into one lane provides sufficient coverage while saving money.  It is important to note that while the multiplexing process adds 6bp barcodes to the Illumina libraries, they do not affect read length.
 +
 +
Paired-end runs sequence DNA in both the forward and reverse directions using the same amount of DNA as a single-end run, allowing for a more precise and accurate alignment of the genome.  Paired-end, long read (+108bp) are preferred for some applications such as de novo sequencing.
==Sample Preparation==
==Sample Preparation==

Revision as of 10:29, 17 May 2011

Image:BioMicroCenter-header6.jpg

Contents

ILLUMINA MASSIVELY PARALLEL SEQUENCING

The MIT BioMicro Center has four high-throughput Illumina sequencers, including the HiSeq 2000, which are currently being used for a variety of applications, including ChIP-Seq, miRNA sequencing and RNA-seq. Each next-generation sequencer can process up to 7 lanes of samples with multiple samples per lane (if multiplexed), with a data yield of approximately 25 million reads per lane for the GAIIx and over 50 million reads for the HiSeq (Single read). Read lengths vary, depending on users, and can be sequenced up to 150nt per side on the sequencers.

Illumina sequencing works by binding randomly fragmented DNA to an optical flowcell . Templates are sequenced by incorporating fluorescently labeled nucleotides in a “Sequencing-By-Synthesis” reaction. A detailed description of this process can be found at Illumina's website.

The Genome Analyzers consist of a cluster generation station, a Paired-End module, and a Genome Analyzer, all of which work in concert to generate and analyze flowcells. An overview of the Illumina Genome Analyzer system can be found at the Illumina website.

For an in-depth overview of the Illumina sequencing chemistry, please refer to the following paper:

  1. Kircher M, Stenzel U, and Kelso J. . pmid:19682367. PubMed HubMed [Paper1]

Determining ideal read length and depth of coverage

The BioMicro Center offers a wide variety of read lengths, both in single-end and paired-end formats. The Illumina GAIIx is capable of single-end and paired-end sequencing from +36bp to a maximum of +150bp, while the HiSeq 2000 is capable of single-end and paired-end sequencing from +36bp to +108bp.

As a flowcell is being run, reads undergo internal quality control, filtering out unreliable reads. The GAIIx averages ~25 million acceptable reads (clusters) per lane, while the HiSeq2000 averages ~50 million. Determining the ideal parameters for a sequencing run requires knowledge of the genome being sequenced.

For example, a 150Mbp genome needs to be sequenced at 5X coverage, requiring 750Mbp of data output (150*5). This can be confidently sequenced using a standard +36bp single-end lane on the Illumina GAIIx, which produces ~900Mbp of data on average (36*25). If a larger genome is being sequenced, for example, one that is 300Mbp, 1.5Gbp is the target data output (300*5), so a standard +36bp single-end lane may not be sufficient, but a +72bp single-end (25*72=1.8Gbp), or a +40bp paired end run (40*50=2Gbp) would be fine.

Multiplexing is useful for applications requiring a lower data output per sample. Sequencing Saccharomyces, which has a ~12.5Mbp genome, at 5X coverage requires 62.5Mbp of data. Multiplexing 10 samples on one lane in a +36bp single read flowcell would require 625Mbp of output to achieve the desired coverage. As stated above, the average output for one lane is ~900Mbp of data, so multiplexing the 10 samples into one lane provides sufficient coverage while saving money. It is important to note that while the multiplexing process adds 6bp barcodes to the Illumina libraries, they do not affect read length.

Paired-end runs sequence DNA in both the forward and reverse directions using the same amount of DNA as a single-end run, allowing for a more precise and accurate alignment of the genome. Paired-end, long read (+108bp) are preferred for some applications such as de novo sequencing.

Sample Preparation

For information on Illumina library preparation techniques and services offered by the BMC please visit the Illumina Library Preparation page (please note that this page is currently under development).
Information is also available about multiplexing.

Applications

Illumina currently provides reagents and support for a five major sequencing applications:

Other DNA Sequencing applications: The following applications have been published but do not yet have kits from Illumina.

  • Genotyping: Protocols are being developed for detection of SNPs, chromosomal rearrangements and other genotyping applications.

Data Analysis

Each lane of the flowcell should produce between 10 and 25 million DNA fragments as of March 2010. Understanding this data often requires a significant investment in informatics. This is complicated by the fact that many applications require entirely different interpretations of the data. As part of our sequencing service we provide many of the early steps of bioinformatics for different applications. Further data processing can be arranged on a collaborative basis as resources are available. For more information, check out the links below:

Pricing

Priority for Illumina sequencing is currently available for labs associated with the BioMicro Center Core departments. We are able to do Illumina sequencing for other MIT and non-MIT users as space allows on the sequencers. Full pricing information is available at our price list.


Protocols

Protocols for all of the supported technologies can be found by visiting the Protocols page

QC

Quality control is very important to help optimize the number of reads and quality of data produced. We run the Bioanalyzer and RT-PCR for all submitted cDNA libraries for Illumina sequencing. For information on QC methods and protocols please visit the Sequencing Quality Control page

MIT Core Collaboration

All samples run on the Illumina sequencer are run in batches of 7 flowcell lanes (if many samples are multiplexed into one lane, they count as one for the purpose of completing a batch). In order to optimize our throughput, we have established a collaboration that allows us to move partial flowcells between the various centers at MIT. For users with less then 4 samples, their samples may be moved between the BioMicro Center, the Whitehead Institute Center for Genome Technologies and the Koch Institute Biopolymer Center. Samples will be moved only to fill out runs or to expedite processing. The Centers are committed to working together to maintain consistent quality between the different cores so you should see no difference whether your samples are run in BioMicro or at one of our sister centers. Transfers are only available for members of the MIT community.

View current samples queuing for Illumina.

All questions about Illumina Sequencing can be directed to Kevin Thai at kthai@mit.edu.

Initial page written by Summeet Gupta at the WI-CGT

Personal tools