A critical directive of the BioMicro Center is to provide a cutting-edge research core for members of the MIT community. Creating and maintaining the BioMicro Center at the forefront of technology improves our ability to support MIT faculty in grant applications, manuscript publishing and in the recruitment of new faculty members. In part, this goal is achieved through ongoing collaborations with many labs at MIT. A selection of these collaborations taken from the annual reports of the BioMicro Center is presented below.
Large-scale discovery and functional analysis of distal enhancer elements - BOYER LAB - Biology and KI
The overall goal of the Boyer lab is to understand how a single cell can ultimately specify the diversity of cell types during mammalian development. An exciting and emerging area of biology in the post-genomics era has been the genome-wide identification of non-coding regulatory elements in what was once known as “junk DNA”. Enhancers are key cis-regulatory elements that can affect gene transcription independent of their orientation or distance that are required for tissue specific patterning of gene expression during development, though only few examples had been known. Global identification of these regions as well as their contribution to target gene expression has been challenging because enhancers can often reside thousands of base pairs away from their target of regulation.
The Boyer lab has recently discovered that specific histone modification patterns could identify enhancers by genome-wide ChIP-Seq in embryonic stem cells (ESCs) as well as in a range of differentiated cell types and moreover, that these patterns distinguish enhancers as either active or poised (or inactive). Remarkably, genes connected to active enhancers code for genes with cell type specific functions and more importantly, poised enhancers could predict future developmental potential of that cell by marking genes that have the potential to become activated. However, it had been unclear how enhancer states were correlated during lineage commitment. Using cutting edge high-throughput sequencing methods, the Boyer lab has now defined a large set (~80,000) of both poised and active enhancers throughout the genome based on chromatin modification patterns derived from four key time points during cardiomyocyte differentiation. The differentiation system provides a unique opportunity to study enhancer state transitions during embryonic patterning of cardiomyocytes, which ultimately comprise the majority of the cell types in the developing heart.
The BioMicro Center was instrumental in providing the technical expertise necessary for the generation of the large number of high quality sequencing libraries from chromatin immunoprecipitated material. The BioMicro Center adapted the use of the IP-Star automated ChIP system (currently under evaluation) to facilitate automation of ChIP followed by library generation on the SPRI-TE. Additionally, the Boyer lab was able to barcode each experimental sample so that multiple sequencing libraries could be run in a single lane of an Illumina flow cell. Barcoded libraries were then analyzed by a number of quality control measures developed by the BioMicro Center to ensure the highest quality of sequence. These steps represented substantial improvements over previous protocols and allowed us to perform many experiments in a cost and time-efficient manner.
Together with the BioMicro Center, the Boyer lab analyzed the substantial amount of sequencing data and developed new algorithms to identify and to functionally dissect the role of distal enhancer elements in regulating gene expression patterns during lineage commitment. As a result of this study, they found that enhancer utilization is highly cell type specific and that enhancer state transitions are dynamic and non-random and likely occur during short windows of developmental time. These exciting findings have provided new details about how tissue specific expression patterns are established early in development and how mutations in these elements may contribute to cardiac diseases.
A major challenge in bacterial genetics is the identification of the molecular targets and pathways affected by newly discovered genes. Toward this end, one powerful technique involves the unbiased selection for mutations that are able to suppress the deleterious effects of gain- or loss-of-function mutations in the gene of interest. However, finding the genomic locations of these suppressor mutations by traditional mapping methods can be time and labor intensive. The BioMicro Center has worked with the Laub lab to bypass the need for genetic mapping by sequencing the entire genomes of mutant bacterial strains.
The Laub lab, working in the bacterium Caulobacter crescentus, has recently characterized a novel gene, sidA, which inhibits cell division in response to DNA damage. To identify the protein targets of sidA, the Laub lab performed a suppressor screen for mutations allowing cells to form colonies despite sidA overproduction. By directed sequencing of candidate genes, most of these mutations were mapped. However, one suppressor strain did not contain mutations in any of the known cell division genes and was directly sequenced to find the mutation.
In order to generate libraries from this strain, the BioMicro Center piloted a new protocol using the Nextera tagmentation system. Standard approaches using fragmentation had appeared to be unsuccessful, possibly due to the high GC percentage of Caulobacter. The tagmentation system uses a Tn5 transposase to insert sequence tags into intact genomes, both tagging them and fragmenting them at the same time. This reduces the number of operator steps and avoids the need for sonication. With the suppressor samples, the Nextera system was directly compared to sonicated DNA prepared with the SPRI-TE. The sequencing data showed that the Nextera system was able to produce very even and consistent coverage of the genome and is now being offered as a service through the BioMicro Center.
Screening yeast libraries for genes involved in DNA damage response - SAMSON LAB - Biology-BE-KI-CEHS
A myriad of new chemicals have been introduced into our environment and exposure to these agents can have detrimental effects on biological systems. Many of these chemicals are thought to have mutagenic activity. Analysis of the cellular response to these potential toxins using S. cerevisiae can provide a description of systems level responses to environmental stress and identify new pathways of DNA damage response. High-throughput techniques have become important tools to establish and clarify toxicity-modulating pathways of potential environmental carcinogens.
The BioMicro Center has worked closely with Laia Quiros-Pesudo from the Samson lab in multiple complimentary screening methods to identify the cellular systems that respond to DNA damage. The initial screening method involved performing barcode-sequencing (Bar-Seq) described by Smith et al. (Genome Res. 2009. 19: 1836-1842) on a haploid yeast knockout library. In Bar-Seq, each knockout strain is identified by two unique barcode sequences (“uptag” and “downtag” barcodes) that can be amplified from its genome and identified using Illumina sequencing, allowing the whole library to be grown together in a single vessel. The importance of each knocked out gene is then measured by comparing the frequency of the strain in the initial knockdown library pool to the frequency after subjecting the pool to an environmental stress which is summarized as the fitness defect ratio. Multiple conditions can be tested simultaneously by using a second molecular barcode added to the library that identifies the experiment.
Using the Bar-Seq approach, the Samson lab was able to simultaneously analyze the frequencies of ~4,800 strains of S. cerevisiae in up to 19 treatments and doses in a single Illumina sequencing lane. The Bar-Seq method was able to reproduce previous results using a solid agar assay and the alkylating agent MMS (Begley et al, 2004). In addition, new groups of sensitive strains have been identified and analysis of these new pathways is currently underway.
In addition to Bar-Seq approaches, the Samson lab is directly screening GFP-fusion libraries to identify proteins that respond to environmental stress. GFP-fusion libraries monitor both changes in protein expression and localization as a result of chemical exposure instead of survival. In these experiments, each strain is individually screened across a number of conditions, requiring significant automation to make the experiment feasible.
In order to perform these screens, the Samson lab has relied on the Tecan EVO 150 liquid handler in the BioMicro Center and the Cellomics ArrayScan VTI HCS reader available through the CEHS Genomics and Imaging Core (similar instruments are also available through the Whitehead Institute and will be available through the Koch Institute). The Tecan EVO150 performed the cellular treatment, fixation and staining of multiple GFP tagged library plates simultaneously and significantly improved the throughput of the library screen. Initial results have been promising.
Processing Very Long Illumina Reads - CHISHOLM LAB - Biology
Different high-throughput sequencing platforms are currently available, and trade-offs currently exist between the cost per sequencing read, the number of reads, and the average read length. The Chisholm lab has been interested in optimizing the Illumina platform for the de novo sequencing of microorganisms. To this end, the Chisholm lab has worked with the BioMicro Center to develop a pipeline that significantly increases the read length yielded by the Illumina sequencing technology, generating sequencing reads that can exceed 250 nucleotides in length. Combined with Illumina's low cost and high-throughput, the procedure expands the range of applications that can be performed with this platform.
Illumina reads tend to decrease in quality with length due to slight errors in incorporation and extension of the growing sequence. To improve the error rate at long read lengths, the Chisholm lab developed an algorithm SHERA (SHortread Error-Reducing Aligner) which uses overlapping paired-end reads to create long and accurate composite reads. SHERA allows more than 87% of the paired-end sequencing reads to produce longer composite sequences with less than 1% of paired reads incorrectly aligned. The quality score of each overlapped base is re-evaluated to take into account the information from the two paired-end reads. The Chisholm lab sequenced a marine metagenomic DNA sample using 454-FLX and the Illumina paired-end overlapping procedure, and found that the taxonomic classification results are highly platform-independent, demonstrating that that composite sequencing reads constitute a cost-effective alternative to pyrosequencing.
The creation of high-quality very long Illumina reads is not only applicable to metagenomics sequencing. The BioMicro Center is currently working to deploy this algorithm for many other applications including amplicon sequencing, transcriptomics, de novo assembly and resequencing for mutation detection. We anticipate a strong growth in very long reads in FY2011. This work has been accepted for publication in PLoS One.
Understanding the role of the human microbiome in health and disease is an emerging field, and has been targeted as a major NIH Roadmap Initiative. Microbial community analysis by 16S rRNA sequencing is a key component of microbiome studies, together with whole genome sequencing and metagenomics. The BioMicro Center has worked with the Alm lab to establish and optimize an experimental approach to generating partial 16S rRNA sequences that is orders of magnitude less expensive than conventional methods, thus enabling unprecedented resolution in microbiome comparisons using the Illumina Genome Analyzers.
The use of Illumina sequencing for assay microbiomes has been limited by read length and by financial considerations. In addition, homopolymeric sequences are very difficult to process using the standard Illumina image analysis software. Improvements to read length introduced by Illumina and careful selection of priming sites have addressed the former issue. The expense of the reads has been solved by using very highly multiplexed lanes. While each lane of Illumina sequence costs $3,300 for the read lengths needed for this project, multiplexing the samples (up to 96x) dramatically lowers the cost per sample. Barcoding the sample also allowed the Alm lab to bypass the lack of complexity in ribosomal reads as the highly diverse sequences meet the criteria needed for spot finding. Data from this project has been used in an NIH grant application and in a patent application.
One of the key limitations of RNA-sequencing is the relatively large amounts of RNA required for each sample. While recent protocols have reduced the amount of total RNA input down below 1g, the initial protocols required 5-10g of material. This is several orders of magnitude higher then we routinely use for microarray analysis.
While most microarray labeling protocols are inappropriate for Illumina sequencing in that they use cRNA as their labeled materials, the NuGen kits used by the BioMicro Center since 2009 are unique in that they use amplified cDNA which has several benefits to microarray analysis. We were particularly interested in the ability of NuGen to handle amounts of RNA in the sub-nanomolar range which could allow next-generation sequencing of RNA from single cells. In order to test the viability of this approach, we established a collaboration with NuGen and with Dr. Chris Burge of the Biology and Biological Engineering departments.
To establish the robustness of the NuGen system for RNA-seq, two mRNA samples were isolated from control and UPF-1 knockdown cells and were prepared either with the NuGen kit (by NuGen technicians) or with standard RNA-seq methods from Illumina (prepared in the Burge lab). The NuGen samples were prepared across a variety of concentrations, and seven paired-end libraries were run. Differential error rates, coverage, sensitivity and differential expression were all calculated by the Burge Lab.
Our results demonstrated that the NuGen kit, unfortunately, has a number of issues that are concerning in the RNA-seq environment. Analysis of coverage showed very uneven coverage of exons, likely due to the semi-random-nonamers that NuGen uses in preparing the cDNA. In addition, the level of noise introduced in differential expression was quite large, at least for an experiment with subtle changes in expression. Our results have discouraged us from focusing on NuGen protocols for looking at RNA-seq data. It has even raised questions to us about the quality of the NuGen kit for exon array analysis, though other whole transcriptome amplifications are probably no better.