A critical directive of the BioMicro Center is to provide a cutting-edge research core for members of the MIT community. Creating and maintaining the BioMicro Center at the forefront of technology improves our ability to support MIT faculty in grant applications, manuscript publishing and in the recruitment of new faculty members. In part, this goal is achieved through ongoing collaborations with many labs at MIT. A selection of these collaborations taken from the annual reports of the BioMicro Center is presented below.
The overall goal of the Saeij lab is to understand the molecular basis for individual differences in outcome of infection. We believe that these differences are either due to host differences in resistance/susceptibility to infection or to genetic differences in the pathogen that causes the infection. As a model we use infection with the obligate intracellular pathogen Toxoplasma gondii. Toxoplasma can infect all warm-blooded animals and a third of the world’s population is estimated to be infected. There are many different Toxoplasma strains that differ hugely in virulence in both mice and humans. Mouse and rat strain differences in resistance/susceptibility to Toxoplasma have also been described.
To characterize Toxoplasma genetic diversity we sequenced (Illumina HiSeq), with the help of the BioMicro Center, the whole genome of 10 Toxoplasma strains and the transcriptome of 32 different strains, representing global diversity. This data allowed us to construct the first Toxoplasma haplotype map and to propose a new model explaining current global diversity. The results were published in PNAS (Minot et al, 2012). We also determined the transcriptome of murine macrophages infected with these 32 different strains. This approach allowed us to correlate genotype with phenotype and have led to the identification of Toxoplasma loci and genes that affect fitness, clonality, virulence and modulation of host signaling pathways. Some of these results were published in PLoS Pathogens (Niedelman et al, 2012).
To determine mouse genetic differences in resistance/susceptibility to Toxoplasma we determined the transcriptome and toxoplasmacidal activity of naïve or IFNγ+TNF-stimulated macrophages isolated from 29 recombinant inbred mice, derived from A/J (Toxoplasma resistant) and C57BL/6 (susceptible) mice. We then identified mouse genomic loci affecting mouse gene expression and toxoplasmacidal activity using quantitative trait locus (QTL) analysis. We are now using genetic approaches to confirm mouse candidate genes involved in macrophage differences in Toxoplasma killing ability. The results of this study have helped us secure four years of grant support from the New England Regional Center of Excellence Biodefense and Emerging Infectious Diseases.<BE><BE> We have received excellent help from the BioMicro Center at every step of these analyses, their many suggestions on how to analyze the data have been instrumental in getting high quality data.
Emerging evidence points to selective translation of critical proteins as a major facet of the cellular stress response. The Dedon lab has shown that while one contribution to this translational control of cell response involves changes in the relative quantities of dozens of modified ribonucleosides in tRNA, there is also evidence for control of the number of copies of individual tRNA molecules. This dual control problem makes it difficult to distinguish changes observed in the level of tRNA modifications caused by altered activity of ribonucleoside-modifying enzymes from changes in the number of tRNA copies. To solve this problem, the Dedon lab has undertaken the development of a high-throughput deep-sequencing method to identify and quantify individual species of tRNA in a bulk population of tRNAs. This approach exploits the paucity of ribonucleoside modifications at the 3’-end of tRNA molecules, modifications that can interfere with RNA sequencing reactions. Following ligation of a custom primer to the 3’-ends of tRNA molecules, reverse transcription is performed to create cDNA that is subsequently subjected to linear amplification off of the custom primer. The amplified products are then identified and quantified by Illumina sequencing. This method allows for the simultaneous quantification of individual tRNA species in total tRNA isolated from cells subjected to different stresses, with application to study controlled degradation of tRNA during cell responses and to differentiate between changes in tRNA copy numbers from changes in RNA modifications due to enzymatic activity.
The BioMicro Center has been very instrumental in providing technical expertise towards optimization of experimental design for the conversion of tRNA to cDNA for sequencing, as well as assisting with bioinformatics analysis of the output data from the Illumina HiSeq 2000. The Dedon lab was able to barcode six different experimental samples so that multiple sequencing libraries could be run in a single lane of an Illumina flow cell. Using the Illumina HiSeq 2000, we were able to identify every tRNA species found in S.cerevisiae and are in the process of optimizing the sequencing method in order to reliably quantify relative changes in the levels of specific tRNAs isoacceptors from different populations of tRNAs.
The Niles lab has partnered with the Dedon Lab and the BioMicro Center in their efforts to develop a protocol for sequencing bulk tRNA. While the Dedon lab is investigating the effect of different stresses on tRNA expression, the Niles lab is interested in the basic endogenous biology of the malaria parasite. Currently, very little is known in the field about tRNA dynamics across the complicated life cycle of Plasmodium falciparum. While many published studies have tracked RNA expression using microarrays and deep sequencing, these protocols enrich for protein coding RNAs, excluding an important component of host biology. Additionally, while the mitochondria does not have any genes to produce tRNA, the apicoplast organelle has a full complement of tRNA genes. As the apicoplast organelle is required for successful parasite infection, having a better understanding of the expression of tRNA from the reduced but necessary apicoplast plastid will assist in the downstream search for both vaccines and pharmacological treatments.
Using the method described above, all tRNAs found in both the genome and apicoplast were detected from extracts of Plasmodium falciparum. However, there were several species of tRNA that jackpotted, appearing at concentrations several orders of magnitude above the remaining species. We are continuing to work with the BioMicro Center and the Dedon Lab to troubleshoot the protocol to determine a more quantitative picture of tRNA dynamics in the parasite.
Elucidating Brn targeting of Sox2 in embryonic stem cells - JAENISCH LAB - Biology and KI
In mammals, a few thousand transcription factors regulate the differential expression of greater than 20,000 genes to specify ~200 cell types during development. How this is accomplished has been a major focus of biology for many years. Transcription factors bind sequence-specific regulatory elements, including proximal promoters and distal enhancers, to control gene expression. Emerging evidence indicates that transcription factor binding at distal enhancers plays important roles in the establishment of tissue-specific gene expression programs during development. Combinatorial binding among groups of transcription factors can further increase the diversity and specificity of regulatory modules governed by a particular factor. The identification of regulatory modules comprised of groups of transcription factors which occupy important regulatory regions of genes which govern cell fate determination would shed light on how developmental decisions are made.
Work from the Jaenisch and Young labs elucidated the role of a group of three transcription factors, Oct4, Sox2, and Nanog, in regulating cell identity in embryonic stem cells (ESCs) using ChIP-on-Chip technology in 2005. Recently, Michael Lodato from the Jaenisch lab, in collaboration with the Boyer lab, has studied the genome-wide role of Sox2 in neural progenitor cells (NPCs). Their work showed that Sox2 occupied a distinct set of targets in NPCs relative to it targets in ESCs, and further that Sox2 switched partner factors during this transition: form Oct4 and Nanog in ESCs to Brn2, and Oct4 family member, in NPCs.
Taking advantage of the expertise in the BioMicro Center, they were able to examine the effect of the NPC-specific Sox2 partner factor, Brn2, on Sox2 binding in ESCs, where Sox2 normally partners with Oct4. The BioMicro Center Staff utilized the IP-Star Automated ChIP System to facilitate the automation of a large number of ChIP-Seq experiments, successfully querying the genome-wide occupancy of Sox2, Brn2, Histone H3 Lysine 4 monomethylation, Histone H3 Lysine 27 acetylation, total Histone H3, and p300 in both control and Brn2-induced ESCs in a rapid and controlled manner. Thus, not only could the binding of Sox2 and Brn2 be investigated, but due to the availability of the IP-STAR automated ChIP system the Jaenisch Lab could investigate changes to the epigenome as well. The BioMicro Center then prepared Illumina libraries from these samples ran them in a single lane of an Illumina flow cell, saving time and immensely reducing the cost to the Jaenisch Lab. Finally, the BioMicro Center implanted quality control metrics on both the samples before loading on the sequencer and on the quality of the data generated during the run.
Using this data, the Jaenisch Lab was able to define a set of regions where ectopic Brn2 could recruit endogenous Sox2 (Figure x), and they are currently investigating the effects of this binding on the chromatin status of these regions. The technical expertise of the staff of the BioMicro Center contributed significantly to the successful completion of this project during both the planning and execution of these experiments.
Large-scale discovery and functional analysis of distal enhancer elements - BOYER LAB - Biology and KI
The overall goal of the Boyer lab is to understand how a single cell can ultimately specify the diversity of cell types during mammalian development. An exciting and emerging area of biology in the post-genomics era has been the genome-wide identification of non-coding regulatory elements in what was once known as “junk DNA”. Enhancers are key cis-regulatory elements that can affect gene transcription independent of their orientation or distance that are required for tissue specific patterning of gene expression during development, though only few examples had been known. Global identification of these regions as well as their contribution to target gene expression has been challenging because enhancers can often reside thousands of base pairs away from their target of regulation.
The Boyer lab has recently discovered that specific histone modification patterns could identify enhancers by genome-wide ChIP-Seq in embryonic stem cells (ESCs) as well as in a range of differentiated cell types and moreover, that these patterns distinguish enhancers as either active or poised (or inactive). Remarkably, genes connected to active enhancers code for genes with cell type specific functions and more importantly, poised enhancers could predict future developmental potential of that cell by marking genes that have the potential to become activated. However, it had been unclear how enhancer states were correlated during lineage commitment. Using cutting edge high-throughput sequencing methods, the Boyer lab has now defined a large set (~80,000) of both poised and active enhancers throughout the genome based on chromatin modification patterns derived from four key time points during cardiomyocyte differentiation. The differentiation system provides a unique opportunity to study enhancer state transitions during embryonic patterning of cardiomyocytes, which ultimately comprise the majority of the cell types in the developing heart.
The BioMicro Center was instrumental in providing the technical expertise necessary for the generation of the large number of high quality sequencing libraries from chromatin immunoprecipitated material. The BioMicro Center adapted the use of the IP-Star automated ChIP system (currently under evaluation) to facilitate automation of ChIP followed by library generation on the SPRI-TE. Additionally, the Boyer lab was able to barcode each experimental sample so that multiple sequencing libraries could be run in a single lane of an Illumina flow cell. Barcoded libraries were then analyzed by a number of quality control measures developed by the BioMicro Center to ensure the highest quality of sequence. These steps represented substantial improvements over previous protocols and allowed us to perform many experiments in a cost and time-efficient manner.
Together with the BioMicro Center, the Boyer lab analyzed the substantial amount of sequencing data and developed new algorithms to identify and to functionally dissect the role of distal enhancer elements in regulating gene expression patterns during lineage commitment. As a result of this study, they found that enhancer utilization is highly cell type specific and that enhancer state transitions are dynamic and non-random and likely occur during short windows of developmental time. These exciting findings have provided new details about how tissue specific expression patterns are established early in development and how mutations in these elements may contribute to cardiac diseases.
A major challenge in bacterial genetics is the identification of the molecular targets and pathways affected by newly discovered genes. Toward this end, one powerful technique involves the unbiased selection for mutations that are able to suppress the deleterious effects of gain- or loss-of-function mutations in the gene of interest. However, finding the genomic locations of these suppressor mutations by traditional mapping methods can be time and labor intensive. The BioMicro Center has worked with the Laub lab to bypass the need for genetic mapping by sequencing the entire genomes of mutant bacterial strains.
The Laub lab, working in the bacterium Caulobacter crescentus, has recently characterized a novel gene, sidA, which inhibits cell division in response to DNA damage. To identify the protein targets of sidA, the Laub lab performed a suppressor screen for mutations allowing cells to form colonies despite sidA overproduction. By directed sequencing of candidate genes, most of these mutations were mapped. However, one suppressor strain did not contain mutations in any of the known cell division genes and was directly sequenced to find the mutation.
In order to generate libraries from this strain, the BioMicro Center piloted a new protocol using the Nextera tagmentation system. Standard approaches using fragmentation had appeared to be unsuccessful, possibly due to the high GC percentage of Caulobacter. The tagmentation system uses a Tn5 transposase to insert sequence tags into intact genomes, both tagging them and fragmenting them at the same time. This reduces the number of operator steps and avoids the need for sonication. With the suppressor samples, the Nextera system was directly compared to sonicated DNA prepared with the SPRI-TE. The sequencing data showed that the Nextera system was able to produce very even and consistent coverage of the genome and is now being offered as a service through the BioMicro Center.
Screening yeast libraries for genes involved in DNA damage response - SAMSON LAB - Biology-BE-KI-CEHS
A myriad of new chemicals have been introduced into our environment and exposure to these agents can have detrimental effects on biological systems. Many of these chemicals are thought to have mutagenic activity. Analysis of the cellular response to these potential toxins using S. cerevisiae can provide a description of systems level responses to environmental stress and identify new pathways of DNA damage response. High-throughput techniques have become important tools to establish and clarify toxicity-modulating pathways of potential environmental carcinogens.
The BioMicro Center has worked closely with Laia Quiros-Pesudo from the Samson lab in multiple complimentary screening methods to identify the cellular systems that respond to DNA damage. The initial screening method involved performing barcode-sequencing (Bar-Seq) described by Smith et al. (Genome Res. 2009. 19: 1836-1842) on a haploid yeast knockout library. In Bar-Seq, each knockout strain is identified by two unique barcode sequences (“uptag” and “downtag” barcodes) that can be amplified from its genome and identified using Illumina sequencing, allowing the whole library to be grown together in a single vessel. The importance of each knocked out gene is then measured by comparing the frequency of the strain in the initial knockdown library pool to the frequency after subjecting the pool to an environmental stress which is summarized as the fitness defect ratio. Multiple conditions can be tested simultaneously by using a second molecular barcode added to the library that identifies the experiment.
Using the Bar-Seq approach, the Samson lab was able to simultaneously analyze the frequencies of ~4,800 strains of S. cerevisiae in up to 19 treatments and doses in a single Illumina sequencing lane. The Bar-Seq method was able to reproduce previous results using a solid agar assay and the alkylating agent MMS (Begley et al, 2004). In addition, new groups of sensitive strains have been identified and analysis of these new pathways is currently underway.
In addition to Bar-Seq approaches, the Samson lab is directly screening GFP-fusion libraries to identify proteins that respond to environmental stress. GFP-fusion libraries monitor both changes in protein expression and localization as a result of chemical exposure instead of survival. In these experiments, each strain is individually screened across a number of conditions, requiring significant automation to make the experiment feasible.
In order to perform these screens, the Samson lab has relied on the Tecan EVO 150 liquid handler in the BioMicro Center and the Cellomics ArrayScan VTI HCS reader available through the CEHS Genomics and Imaging Core (similar instruments are also available through the Whitehead Institute and will be available through the Koch Institute). The Tecan EVO150 performed the cellular treatment, fixation and staining of multiple GFP tagged library plates simultaneously and significantly improved the throughput of the library screen. Initial results have been promising.
Processing Very Long Illumina Reads - CHISHOLM LAB - Biology
Different high-throughput sequencing platforms are currently available, and trade-offs currently exist between the cost per sequencing read, the number of reads, and the average read length. The Chisholm lab has been interested in optimizing the Illumina platform for the de novo sequencing of microorganisms. To this end, the Chisholm lab has worked with the BioMicro Center to develop a pipeline that significantly increases the read length yielded by the Illumina sequencing technology, generating sequencing reads that can exceed 250 nucleotides in length. Combined with Illumina's low cost and high-throughput, the procedure expands the range of applications that can be performed with this platform.
Illumina reads tend to decrease in quality with length due to slight errors in incorporation and extension of the growing sequence. To improve the error rate at long read lengths, the Chisholm lab developed an algorithm SHERA (SHortread Error-Reducing Aligner) which uses overlapping paired-end reads to create long and accurate composite reads. SHERA allows more than 87% of the paired-end sequencing reads to produce longer composite sequences with less than 1% of paired reads incorrectly aligned. The quality score of each overlapped base is re-evaluated to take into account the information from the two paired-end reads. The Chisholm lab sequenced a marine metagenomic DNA sample using 454-FLX and the Illumina paired-end overlapping procedure, and found that the taxonomic classification results are highly platform-independent, demonstrating that that composite sequencing reads constitute a cost-effective alternative to pyrosequencing.
The creation of high-quality very long Illumina reads is not only applicable to metagenomics sequencing. The BioMicro Center is currently working to deploy this algorithm for many other applications including amplicon sequencing, transcriptomics, de novo assembly and resequencing for mutation detection. We anticipate a strong growth in very long reads in FY2011. This work has been accepted for publication in PLoS One.
Understanding the role of the human microbiome in health and disease is an emerging field, and has been targeted as a major NIH Roadmap Initiative. Microbial community analysis by 16S rRNA sequencing is a key component of microbiome studies, together with whole genome sequencing and metagenomics. The BioMicro Center has worked with the Alm lab to establish and optimize an experimental approach to generating partial 16S rRNA sequences that is orders of magnitude less expensive than conventional methods, thus enabling unprecedented resolution in microbiome comparisons using the Illumina Genome Analyzers.
The use of Illumina sequencing for assay microbiomes has been limited by read length and by financial considerations. In addition, homopolymeric sequences are very difficult to process using the standard Illumina image analysis software. Improvements to read length introduced by Illumina and careful selection of priming sites have addressed the former issue. The expense of the reads has been solved by using very highly multiplexed lanes. While each lane of Illumina sequence costs $3,300 for the read lengths needed for this project, multiplexing the samples (up to 96x) dramatically lowers the cost per sample. Barcoding the sample also allowed the Alm lab to bypass the lack of complexity in ribosomal reads as the highly diverse sequences meet the criteria needed for spot finding. Data from this project has been used in an NIH grant application and in a patent application.
One of the key limitations of RNA-sequencing is the relatively large amounts of RNA required for each sample. While recent protocols have reduced the amount of total RNA input down below 1g, the initial protocols required 5-10g of material. This is several orders of magnitude higher then we routinely use for microarray analysis.
While most microarray labeling protocols are inappropriate for Illumina sequencing in that they use cRNA as their labeled materials, the NuGen kits used by the BioMicro Center since 2009 are unique in that they use amplified cDNA which has several benefits to microarray analysis. We were particularly interested in the ability of NuGen to handle amounts of RNA in the sub-nanomolar range which could allow next-generation sequencing of RNA from single cells. In order to test the viability of this approach, we established a collaboration with NuGen and with Dr. Chris Burge of the Biology and Biological Engineering departments.
To establish the robustness of the NuGen system for RNA-seq, two mRNA samples were isolated from control and UPF-1 knockdown cells and were prepared either with the NuGen kit (by NuGen technicians) or with standard RNA-seq methods from Illumina (prepared in the Burge lab). The NuGen samples were prepared across a variety of concentrations, and seven paired-end libraries were run. Differential error rates, coverage, sensitivity and differential expression were all calculated by the Burge Lab.
Our results demonstrated that the NuGen kit, unfortunately, has a number of issues that are concerning in the RNA-seq environment. Analysis of coverage showed very uneven coverage of exons, likely due to the semi-random-nonamers that NuGen uses in preparing the cDNA. In addition, the level of noise introduced in differential expression was quite large, at least for an experiment with subtle changes in expression. Our results have discouraged us from focusing on NuGen protocols for looking at RNA-seq data. It has even raised questions to us about the quality of the NuGen kit for exon array analysis, though other whole transcriptome amplifications are probably no better.