CH391L/S13/MetagenomicBioprospecting

From OpenWetWare

< CH391L/S13(Difference between revisions)
Jump to: navigation, search
Current revision (23:21, 24 February 2013) (view source)
(Undo revision 678977 by Andre C Maranhao (Talk))
 
Line 1: Line 1:
[[Category:CH391L_S12]]
[[Category:CH391L_S12]]
-
==Introduction==
+
==What's Bioprospecting?==
-
Metagenomics and bioprospecting are two 'umbrella' terms that were independently coined in the 1990's. These terms can create some confusion given shared characteristics as well as the opportunity for bioprospecting using metagenomics, which will be discussed later. It is best to remember that bioprospecting is not a separate field of research. Whereas metagenomics is indeed a field of research, bioprospecting is more akin to a strategy or process encompassing many techniques.
+
Bioprospecting is a catch-all term for activities including discovery, acquisition, and utilization of novel biomaterials. This has historically been a controversial activity, often leading to unregulated commercialization of fauna (e.g., plants and medicinals) from third world countries for the benefit of commercial interests [[http://www.etcgroup.org/en/node/482 Pros/Cons of Bioprospecting]]. However, as a term in Molecular Biology, it reflects the growing need to discover new types of protein and nucleic acid parts, which can be used in biotechnology and basic research. The advent of multiple Next-Generation Sequencing technologies since 2006 now provides depth of information into the entire genomes ([[#Metagenomics|Metagenomics]]) of species previously inaccessible to basic research. <cite>BiotechLettReview2010</cite>
-
In considering these different but closely related concepts, metagenomics and bioprospecting are essentially "different sides of the same coin". As different ''sides'' or aspects to a common foundation, metagenomics - as fundamental science - and bioprospecting  - as applied science - both draw upon access to the wealth of biological information found in nature. More specifically, metagenomics is the sum of all genetic information present in a given environmental samples. Conversely, bioprospecting is simply application-driven research aimed at the discovery of commercially relevant biomaterials.
+
===Examples of Genes Identified via Bioprospecting===
 +
====GFP====
 +
Although not planned, one of the great examples of Bioprospecting is the story of Green Fluorescent Protein (GFP), a protein that has had a profound impact on every major field in modern biology. Originally isolated and characterized by Osamu Shimomura in the 1960's and 1970's from jellyfish and sea pansies, it was a mere oddity that conferred the eery bioluminescence of certain deep sea creatures. However, the subsequent cloning of the gene by Martin Chalfie and improvement into enhanced GFP by Roger Tsien made it into one of the modern workhorses in biology. This 40-year journey earned Shimomura, Chalfie, and Tsien the 2008 Nobel Prize in Chemistry. [[http://www.brighterideasinc.com/proteins-antibodies/history-of-gfp-and-gfp-antibodies/ History of GFP]]
-
Of these somewhat fraternal concepts, bioprospecting could be considered the older sibling. Bioprospecting derives from the field of chemical ecology wherein the discovery and commercialization of natural products was previously known as 'chemical prospecting.' While similar in principle, chemical prospecting ultimately employed chemical synthesis of newly discovered, commercially relevant compounds. The recent advent of next-generation sequencing, recombinant DNA techniques, and the field of biotechnology in general allowed the development of bioprospecting as a unique concept. Those same technological advances and an interest in natural products would also lead to metagenomics.
+
====Polymerases====
 +
Polymerases such as the Klenow fragment and more importantly Taq polymerase, have permitted the synthesis of DNA fragments. For instance, Taq was discovered in a thermophilic bacterium, and because it can withstand extreme heat (~95 celcius) without losing activity permitted the use of thermal cycling in the Polymerase Chain Reaction (PCR). This allows for denaturation and subsequent reannealing of DNA strands used in the exponential amplification of target sequences. PCR earned Mullis and Smith the 1993 Nobel Prize. Polymerases also include the important phage RNA polymerases such T7, T3, or SP6 which have permitted the in vitro transcription of DNA templates into RNA.
-
==Bioprospecting: Hunting for Utility in Nature==
+
====Reverse Transcriptases====
-
Bioprospecting covers the many activities involved in discovery and utilization of biological material. In the past, bioprospecting primarily focused upon natural products and drug discovery. Still, bioprospecting has led to the discovery of numerous enzyme and protein tools widely used in the pharmaceutical and research communities. Current research efforts along with improvements in sequencing technologies may expand the breadth of activities that constitute bioprospecting.
+
The discovery of Reverse-Transcriptases, essentially polymerases that copy RNA into DNA, have allowed for the study of RNA via generation of cDNA's. These proteins are found in RNA viruses, mobile genetic elements, and mammalian telomerase. The work led to the 1974 Nobel Prize for David Baltimore and Howard Temin.
-
===Therapeutics & Drug Discovery===
+
====Restriction Endonucleases====
-
[[Image:Tunicate komodo.jpg | thumb | right | 200 px | Underwater image of a sea squirt (Polycarpa aurata) from Komodo National Park.]]
+
The discovery of Restriction Endonucleases in the 1970's fueled the Molecular Biology revolution and the advent of genetic engineering. Found in bacteria and archaea, they act to degrade foreign DNA by cleaving at specific palindromic sequences. This work led to the 1978 Nobel prize to Nathans, Arber, and Smith.
-
As expected, there are many examples of bioprospecting for the purpose of drug discovery. As an outgrowth from chemical prospecting, considerable bioprospecting efforts - both past and present - have focused on plant secondary metabolites. One potent chemotherapy drug, paclitaxel (i.e. taxol) serves as an excellent example of this transition from chemical prospecting to bioprospecting. The isoprenoid compound now know as paclitaxel was discovered in the bark of the Pacific Yew tree. Before the adoption of semi-synthetic production in 1988, therapeutic paclitaxel production relied upon low yield chemical extraction <cite>Boghigian2011</cite>. Using metabolic engineering techniques, researchers created transgenic ''Arabidopsis thaliana'' capable of producing taxidene, the first committed step in paclitaxel biosynthesis <cite>Besumbes2004</cite>. Since then, additional research led to production via plant cell fermentation. More recently, researchers engineered strains of ''E. coli'' and yeast with the capacity to produce taxidene and other isoprenoid compounds <cite>Boghigian2011</cite><cite>Engels2008</cite>. This was accomplished following the introduction of isoprenoid biosynthesis pathways. The tale of paclitaxel is principally considered a feat in the field of metabolic engineering. Still, those engineered strains of ''E. coli'' and yeast serve as platform technologies for tractable expression of other newly discovered enzymes.
+
-
Although terrestrial plants remain an important aspect of bioprospecting, increasing attention is being paid to marine biodiversity in the search for new therapies. Study of tunicated has led to the discovery of numerous cytotoxic compounds with potential anticancer properties <cite>Rinehart1999</cite>. Commonly known as seaweed, macroalgae offer another opportunity for bioprospecting <cite>Pereira2012</cite>.
+
==Metagenomics==
 +
[[image:GOLD.gif | GOLD genome projects| thumb|top|300px]]
 +
Metagenomics uses Next Generation Sequencing Technologies (e.g., Whole Genome Shotgun Sequencing (WGS), Roche 454, Illumina, ABI Solid) or Protein analysis (Mass Spectrometry) to completely sample the genomes of mixed microbial communities, generating an unbiased view of genomic sequence space. Estimates have suggested that greater than 99% of all microbes are unculturable in the lab and inaccessable to traditional laboratory analysis. Thus, these Next Generation Sequencing approaches allow for analysis of microbes that are small percentages of a microbial community. The current explosion in various Metagenomic projects (340 current projects, 1990 samples [[http://www.genomesonline.org/cgi-bin/GOLD/index.cgi GOLD database]]) permits for entirely in silico approaches to identifying new gene families, with potential as parts in Synthetic Biology.
-
===Biofuels===
+
====Craig Venter and his Yacht====
-
Regarding second-generation or advanced biofuels, bioprospecting techniques are becoming an increasingly important strategy for biochemical pathway engineering and overall optimization. In a 2010 publication, LS9, Inc. reported the discovery of alkane biosynthesis pathways in a diverse set of cyanobacteria. Those enzymes were subsequently expressed in ''E. coli'' for the production of higher-value biofuel products <cite>Schirmer2010</cite>. That body of work provides an excellent demonstration of various bioprospecting techniques.
+
-
===Research Tools===
+
In the early 2000's, the J. Craig Venter Institute set as one of its goals to sequence the genomic diversity in the oceans. Craig Venter used his personal yacht, the Sorcerer II, to traverse the Earth's oceans, taking samples of oceanic life and sequencing using Whole Genome Shotgun Sequencing. From this adventure, they uncovered 6 million proteins (double the current database), which consisted of 1,700 clusters of gene families with no known homology. The data also revealed homology for 6,000 unknown ORF families (ORFan). They found that a very high proportion of new genes belonged to viruses (likely marine phage), which current databases had underrepresented.  <cite>PLOSbio2007</cite> [[image:SorcererII.png | Global Ocean Explorer| thumb|right|200px]]
-
Fluorescent proteins are likely one of the most famous research tools derived from bioprospecting. Examples include dsRed as well as GFP and its many derivatives, which have been utilized throughout biological research. Interestingly, these fluorescent proteins are finding new purpose in medicine as visual guides during surgery. Before tumorectomy, a mouse with internal tumors is injected with a recombinant form of GFP, which is targeted to and accumulates on the cells of blood vessels. During surgical removal of the tumor, the introduced GFP provides a surgeon with a strong visual queue of nearby blood vessels greatly reducing the risk of blood vessel lacerations.
+
-
DNA and RNA polymerases are the workhorses of modern biotechnology. Almost every aspect of modern biological research depends upon nucleic acid polymerases in one way or another. Recombinant cloning techniques, Sanger sequencing, and qPCR cover a few of the most common uses. These examples also highlight the shared importance of nucleic acid polymerases and Polymerase Chain Reaction (PCR). It was the development of PCR using Taq polymerase that began the drive for bioprospecting of DNA and RNA polymerases. Over the years, several other polymerases of thermophilic origin have been discovered and rapidly commercialized. One area of considerable interest is the discovery or development of high-fidelity, thermostable reverse transcriptases.
+
===Examples of Bioprospecting using Metagenomics (Targeted Metagenomics)===
 +
A useful approach to Bioprospecting new genes involves either functional screening or pure sequence screening in what is called Targeted Metagenomics. This involves either challenging microbiota to a particular activity, or looking for specific families of genes. Both types of Targeted Metagenomic screens have led to new antibiotic resistance genes, cold-adaptive rRNA's, and cellulosic enzymes, to name just a few.   <cite>EnviroMicroReview2011</cite>.
-
Using bioprospecting techniques, one research group isolated and cultured a novel thermophilic bacterium from a hot spring. That bacterium's DNA polymerase I gene was subsequently cloned and engineered to alter its specificity from DNA to RNA. In this manner, the researchers mutated the DNA-dependent DNA polymerase into an RNA-dependent DNA polymerase (i.e. a reverse transcriptase) <cite>Sano2012</cite>.
+
:'''----Typical Targeted Metagenomic Pipeline----'''
-
 
+
::#Extract (DNA, RNA, or Protein) from Environmental Sample
-
==Metagenomics: Biocoenosis Data Mining==
+
::#Next Gen Sequencing or Mass Spec
-
Consideration of [http://en.wikipedia.org/wiki/Biological_organisation biological organization] greatly assists understanding the meaning of metagenomics. Within that conceptual framework, metagenomics would be a higher level element similar to the population or community tiers of biological organization. In brief, metagenomics refers to the sum of all genetic information present in an environmental sample. The term itself was coined in 1998 <cite>Handelsman1998</cite>. Shortly thereafter, researchers characterized the first bacterial rhodopsin protein, which was isolated from seawater genomic DNA  fragments <cite>Beja2000</cite>.
+
::#Computational analysis for ORFs and homology searches
-
 
+
::#Heterologous Expression and Testing for function
-
===The Benefit and Cost of Pyrosequencing Technology===
+
-
[[image:SorcererII.png | Global Ocean Explorer| thumb|right|200px]]
+
-
Since the turn of the century, metagenomics has bosomed as a field. Decreasing per basepair cost of pyrosequencing technologies has greatly increased the number of metagenomic research projects. The April 2012 release of the UniProt database comprised an impressive 20.6 million protein sequences. However, only 2.8% of those protein sequences were confirmed to exist by analysis at the protein and or transcript level. The matter is further complicated as the probability of feature identification is proportional to read length. So, there is a significant difference between the information derived from pyrosequencesing reads versus Sanger sequences <cite>Temperton2012</cite>.
+
-
 
+
-
===Craig Venter and his Yacht===
+
-
In the early 2000's, the J. Craig Venter Institute set as one of its goals to sequence the genomic diversity in the oceans. Craig Venter used his personal yacht, the Sorcerer II, to traverse the Earth's oceans, taking samples of oceanic life and sequencing using Whole Genome Shotgun Sequencing. From this adventure, they uncovered 6 million proteins (double the current database), which consisted of 1,700 clusters of gene families with no known homology. The data also revealed homology for 6,000 unknown ORF families (ORFan). They found that a very high proportion of new genes belonged to viruses (likely marine phage), which current databases had underrepresented.  <cite>Yooseph2007</cite>
+
-
 
+
-
==Targeted Metagenomics: Bioprospecting with Metagenomics==
+
-
[[Image:CH391L_S13_Construction_&_Screening_of_Metagenomic_Library.jpg | thumb | right | 300 px | Schematic for the acquisition and analysis of metagenomic information]]
+
-
Targeted metagenomics is a useful combination of bioprospecting and metagenomics. This technique can be used to identify novel genes by screening ORFs derived from a metagenomic library. It is possible to conduct an initial screen computationally by first parsing identified ORFs for a desired homology. Following cloning, a functional screen is used to identify and recovery desired genes from the metagenomic library. Targeted metagenomic screens have led to the discover of new antibiotic resistance genes, cold-adaptive rRNAs, and cellulose degrading enzymes <cite>Suenaga2012</cite>.
+
-
 
+
-
::'''Typical Targeted Metagenomic Pipeline'''
+
-
::#Isolate nucleic acids (i.e. DNA, RNA) from an environmental sample
+
-
::#Conduct Next-Gen sequencing
+
-
::#Conduct computational analysis: ORF and sequence homology
+
-
::#Heterologous expression of ORF library followed by functional screening
+
=====Cellulosic Biomass degrading genes found in Cow Rumen=====
=====Cellulosic Biomass degrading genes found in Cow Rumen=====
-
Plant polysaccharides such as cellulose are not broken down with enyzmes found in mammals, but species such as Ruminents (cows) carry symbiotic bacteria that perform this job. These microbes cannot be cultured in lab. However, acquisition of the enzymes used to break down cellulose could be used to generate biofuel from easily grown plants like grass. Here, Mattias Hess and colleagues used Metagenomic analysis to identify 51 enzymes active in breaking down polysaccharides <cite>Hess2011</cite>. The authors isolated microbes from a nylon bag filled with switchgrass placed inside a fistula created into a cow's rumen. Various Next-Gen Sequencing technologies were used to generate 268 Giga-basepairs of sequences. From the various organisms, they predicted 27,755 putative polysaccharide enyzymes, of which 43% had less than 50% similarity to any known sequence. The authors expressed and tested 90 of these genes based on similarity to glycosyl hyrdolase domains, which identified 51 active enzymes. The enyzmes were active against various substrates used as biofuel crops. Finally, the authors generated 15 draft genomes of new microbial species.  A number of earlier studies have also attempted to identify glycosyl hydrolases in species such as termites and panda <cite>Chistoserdova2010</cite><cite>Suenaga2012</cite>
+
Plant polysaccharides such as cellulose are not broken down with enyzmes found in mammals, but species such as Ruminents (cows) carry symbiotic bacteria that perform this job. These microbes cannot be cultured in lab. However, acquisition of the enzymes used to break down cellulose could be used to generate biofuel from easily grown plants like grass. Here, Mattias Hess and colleagues used Metagenomic analysis to identify 51 enzymes active in breaking down polysaccharides <cite>Science2011</cite>. The authors isolated microbes from a nylon bag filled with switchgrass placed inside a fistula created into a cow's rumen. Various Next-Gen Sequencing technologies were used to generate 268 Giga-basepairs of sequences. From the various organisms, they predicted 27,755 putative polysaccharide enyzymes, of which 43% had less than 50% similarity to any known sequence. The authors expressed and tested 90 of these genes based on similarity to glycosyl hyrdolase domains, which identified 51 active enzymes. The enyzmes were active against various substrates used as biofuel crops. Finally, the authors generated 15 draft genomes of new microbial species.  A number of earlier studies have also attempted to identify glycosyl hydrolases in species such as termites and panda <cite>BiotechLettReview2010</cite><cite>EnviroMicroReview2011</cite>
=====Uranium Bioremediation=====
=====Uranium Bioremediation=====
-
Bioremediation of uranium waste is an important industrial process. Here, the authors used a proteomics based approach to identify proteins in iron-reducing (FeIII) microbial species used in the reduction of soluble uranium to insoluble uranium <cite>Wilkins2009</cite>. Although the study here does identify new genes, it identifies new metabolic pathways. The authors started with 3 known Geobacter species stimulated with acetate and used in a bioremediation project and used the known proteins as a reference for performming 2D liquid chromatography tandem mass spectrometry. They identified over 13,000 peptides and 2,500 proteins. Not surprisingly, the authors show that acetate utilization increases via the TCA cycles (acetyl-CoA enzymes) presumably for increased growth as a fuel source. They also note high abundances of pyruvate ferrodoxin oxidoreductase, suggesting the bacteria are undergoing high carbon fixation processes.
+
Bioremediation of uranium waste is an important industrial process. Here, the authors used a proteomics based approach to identify proteins in iron-reducing (FeIII) microbial species used in the reduction of soluble uranium to insoluble uranium <cite>ApplEnviroMicro2009</cite>. Although the study here does identify new genes, it identifies new metabolic pathways. The authors started with 3 known Geobacter species stimulated with acetate and used in a bioremediation project and used the known proteins as a reference for performming 2D liquid chromatography tandem mass spectrometry. They identified over 13,000 peptides and 2,500 proteins. Not surprisingly, the authors show that acetate utilization increases via the TCA cycles (acetyl-CoA enzymes) presumably for increased growth as a fuel source. They also note high abundances of pyruvate ferrodoxin oxidoreductase, suggesting the bacteria are undergoing high carbon fixation processes.
==Current status and the future==
==Current status and the future==
Line 67: Line 54:
For a  review on methods of constructing metagenomic libraries to screen for useful genes as well as other useful genes isolated from the metagenome, see <cite>Daniel2004</cite>.
For a  review on methods of constructing metagenomic libraries to screen for useful genes as well as other useful genes isolated from the metagenome, see <cite>Daniel2004</cite>.
-
===The Human Microbiome Project===
+
===Limitations===
-
In a 2007 Nature article, researchers outlined the logistics of and rationale for amassing a human-microbe metagenome. Those authors described a human as a conglomerate of both human ''and'' microbial cells. Accordingly, they went on to postulate that this project would lead to an understanding of an individual's micro-evolution, that human's health, and disease predisposition. The author's further postulated that the resulting database would deepen the understanding of diagnostic biomarkers while having potential ramifications in industry through novel enzyme discovery <cite>Turnbaugh2007</cite>.
+
-
===Virology===
+
====Diversity====
-
Development of metagenomic techniques and protocols has also stimulated the field of virology. Viral metagenomic studies have led to the discovery of many previously unknown viruses. These endeavors have generated a vast amount of viral sequences for which the majority of sequences are reported as unknown <cite>Rosario2011</cite>. It is worth noting that viral metagenomics has its own unique difficulties. Some viruses employ modified nucleotides as one part of their infection strategy. Additionally, many viruses employ lytic genes that could kill bacteria used during routine cloning. These and other factors necessitate techniques such as LASLs or link-amplified shotgun libraries <cite>Breitbart2002</cite>.
+
Often, the search for novel genes amongst divergent microbial species is limited to genes with small identical regions of DNA for cloning. Cloning of genes from a diverse sample can be helped with prior knowledge of the target gene family. Various domains of a given gene family can be targeted with degenerate primers to isolate similar genes in a microbial community sample. On the other hand, even the use of degenerate primers may not capture the entirety of gene families in a diverse sample. Alternatively, phylogenetic analysis of regions with high similarity (low degeneracy) can help for designing primers that better capture the genes.
-
==References==
+
Many of the current Next-Gen Sequencers are limited by short read lengths, which can prove problematic in the de novo construction of a genome. In addition, many protocols utilize emPCR after nebulization of DNA, which can introduce sequencing coverage bias. Nebulization randomly fragments sample genomic DNA into various lengths, which can be of varying GC content and secondary structure. Since emPCR amplifies single fragments, this can often lead to underrepresentation of difficult to PCR fragments. For low copy templates, this can often mean low sequence coverage in important areas.
 +
 
 +
The coming introduction of single-molecule long read sequencers, such as that by Pacific Biosciences may alleviate some of these limitations. Finally,  it is unlikely that genes found in nature will cover all the uses humanity may come up with. Since nature settles for genes that function "well-enough", this may be inadequate since humans require efficiency.
 +
 
 +
====Rare Genomes and Low Density Environments====
 +
Another issue is filtering out the interesting bacteria from the less interesting bacteria in a metagenome. When creating a library, it seems like it would be hard to maintain enough complexity to capture the plausibly rare and beneficial genes you're looking for. One solution to this might be to look into metagenomes where there is a huge selection pressure to have the genes you're looking for.  Such an environment would have to be particularly hostile - something with very high concentrations of a given contaminant. While it doesn't allow you to capture the genomes of rare bacteria any better, it would increase the relative population of a bacteria with the physiology of interest. An interesting paper that talks about some of the evolutionary dynamics of such an environment has been recently published <cite>Hemme2010</cite>. The situation does however create sort of a catch-22. Harsher conditions that lead to over representation of useful genes to cope with the environment also can have significantly lower overall concentrations of cells - on the order of 10,000/g of soil. By traditional library construction methods, this would require you to harvest dozens of kilograms of soil for extraction. A solution is to use φ29 DNA polymerase to amplify low concentrations of environmental DNA for library construction <cite>Abulencia2006</cite>. While this method has a lot of potential to give access to low concentration metagenomes, the innate biases it creates are not well understood.
-
<biblio>
+
 
-
#Boghigian2011 Boghigian Brett A. Simultaneous production and partitioning of heterologous polyketide and isoprenoid natural products in an Escherichia coli two-phase bioprocess. J Ind Microbiol Biotechnol, 2011
+
==References==
-
#Besumbes2004 Besumbes, Oscar. Metabolic engineering of isoprenoid biosynthesis in Arabidopsis for the production of taxadiene, the first committed precursor of Taxol. Biotechnol Bioeng, 2004.
+
<biblio>  
-
#Engels2008 Engels, Benedikt. Metabolic engineering of taxadiene biosynthesis in yeast as a first step towards Taxol (Paclitaxel) production. Metab Eng, 2008.
+
#BiotechLettReview2010 pmid=20495950
-
#Rinehart1999 Rinehart, K.L. Antitumor Compounds from Tunicates. Med Res Rev, 1999.
+
//Recent progress and new challenges in metagenomics for biotechnology.
-
#Pereira2012 Pereira, Renato C. Bioprospecting for bioactives from seaweeds: potential, obstacles and alternatives. Braz J Pharmacogn, 2012.
+
#EnviroMicroReview2011 pmid=21366818
-
#Schirmer2010 Schirmer, Andreas. Microbial Biosynthesis of Alkanes. Science, 2010.
+
//Targeted metagenomics: a high-resolution metagenomics approach for specific gene
-
#Sano2012 Sano, Sotaro. Mutations to create thermostable reverse transcriptase with bacterial family A DNA polymerase from Thermotoga petrophila K4. J Biosci Bioeng, 2012.
+
clusters in complex microbial communities.
-
#Handelsman1998 Handelsman, Jo. Molecular biology access to the chemistry of unknown soil microbes: a new frontier for natural products. Chemistry & Biology, 1998.
+
#PLOSbio2007 pmid=17355171
-
#Beja2000 Beja, Oded. Bacterial Rhodopsin: Evidence for a New Type of Phototrophy in the Sea. Science, 2000.
+
//The Sorcerer II Global Ocean Sampling expedition: expanding the universe of
-
#Temperton2012 Temperton, Ben. Metagenomics: microbial diversity through a scratched lens. Curr Opin Microbiol, 2012.
+
protein families.  
-
#Yooseph2007 Yooseph, Shibu. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol, 2007.
+
#Science2011 pmid=21273488
-
#Turnbaugh2007 Turnbaugh, Peter J. The Human Microbiome Project. Nature, 2007.
+
//Metagenomic discovery of biomass-degrading genes and genomes from cow rumen.
-
#Rosario2011 Rosario, Karyna. Exploring the world through viral metagenomics. Curr Opin Virol, 2011.
+
#ApplEnviroMicro2009 pmid=19717633
-
#Suenaga2012 Suenaga, Hikaru. Targeted metagenomics: a high-resolution metagenomics approach for specific gene clusters in complex microbial communities. Environ Microbiol, 2012.
+
//Proteogenomic monitoring of Geobacter physiology during stimulated uranium
-
#Hess2011 Hess, Matthius. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science, 2011.
+
bioremediation.
-
#Chistoserdova2010 Chistoserdova, Ludmila. Recent progress and new challenges in metagenomics for biotechnology. Biotechnol Lett, 2010.
+
#Daniel2004 pmid=15193327
-
#Wilkins2009 Wilkins, Michael J. Proteogenomic monitoring of Geobacter physiology during stimulated uranium bioremediation. Appl Environ Microbiol, 2009.
+
//The soil metagenome--a rich resource for the discovery of novel natural products.
-
#Daniel2004 Daniel, Rolf. The soil metagenome--a rich resource for the discovery of novel natural products. Curr Opin Biotechnol, 2004.
+
#Hemme2010 pmid=20182523
-
#Breitbart2002 Breitbart, Mya. Genomic analysis of uncultured marine viral communities. PNAS, 2002.
+
//Metagenomic insights into evolution of a heavy metal-contaminated groundwater
-
</biblio>
+
microbial community.
 +
#Abulencia2006 pmid=16672469
 +
//Environmental whole-genome amplification to access microbial populations in
 +
contaminated sediments.

Current revision

Contents

What's Bioprospecting?

Bioprospecting is a catch-all term for activities including discovery, acquisition, and utilization of novel biomaterials. This has historically been a controversial activity, often leading to unregulated commercialization of fauna (e.g., plants and medicinals) from third world countries for the benefit of commercial interests [Pros/Cons of Bioprospecting]. However, as a term in Molecular Biology, it reflects the growing need to discover new types of protein and nucleic acid parts, which can be used in biotechnology and basic research. The advent of multiple Next-Generation Sequencing technologies since 2006 now provides depth of information into the entire genomes (Metagenomics) of species previously inaccessible to basic research. [1]

Examples of Genes Identified via Bioprospecting

GFP

Although not planned, one of the great examples of Bioprospecting is the story of Green Fluorescent Protein (GFP), a protein that has had a profound impact on every major field in modern biology. Originally isolated and characterized by Osamu Shimomura in the 1960's and 1970's from jellyfish and sea pansies, it was a mere oddity that conferred the eery bioluminescence of certain deep sea creatures. However, the subsequent cloning of the gene by Martin Chalfie and improvement into enhanced GFP by Roger Tsien made it into one of the modern workhorses in biology. This 40-year journey earned Shimomura, Chalfie, and Tsien the 2008 Nobel Prize in Chemistry. [History of GFP]

Polymerases

Polymerases such as the Klenow fragment and more importantly Taq polymerase, have permitted the synthesis of DNA fragments. For instance, Taq was discovered in a thermophilic bacterium, and because it can withstand extreme heat (~95 celcius) without losing activity permitted the use of thermal cycling in the Polymerase Chain Reaction (PCR). This allows for denaturation and subsequent reannealing of DNA strands used in the exponential amplification of target sequences. PCR earned Mullis and Smith the 1993 Nobel Prize. Polymerases also include the important phage RNA polymerases such T7, T3, or SP6 which have permitted the in vitro transcription of DNA templates into RNA.

Reverse Transcriptases

The discovery of Reverse-Transcriptases, essentially polymerases that copy RNA into DNA, have allowed for the study of RNA via generation of cDNA's. These proteins are found in RNA viruses, mobile genetic elements, and mammalian telomerase. The work led to the 1974 Nobel Prize for David Baltimore and Howard Temin.

Restriction Endonucleases

The discovery of Restriction Endonucleases in the 1970's fueled the Molecular Biology revolution and the advent of genetic engineering. Found in bacteria and archaea, they act to degrade foreign DNA by cleaving at specific palindromic sequences. This work led to the 1978 Nobel prize to Nathans, Arber, and Smith.

Metagenomics

GOLD genome projects
GOLD genome projects

Metagenomics uses Next Generation Sequencing Technologies (e.g., Whole Genome Shotgun Sequencing (WGS), Roche 454, Illumina, ABI Solid) or Protein analysis (Mass Spectrometry) to completely sample the genomes of mixed microbial communities, generating an unbiased view of genomic sequence space. Estimates have suggested that greater than 99% of all microbes are unculturable in the lab and inaccessable to traditional laboratory analysis. Thus, these Next Generation Sequencing approaches allow for analysis of microbes that are small percentages of a microbial community. The current explosion in various Metagenomic projects (340 current projects, 1990 samples [GOLD database]) permits for entirely in silico approaches to identifying new gene families, with potential as parts in Synthetic Biology.

Craig Venter and his Yacht

In the early 2000's, the J. Craig Venter Institute set as one of its goals to sequence the genomic diversity in the oceans. Craig Venter used his personal yacht, the Sorcerer II, to traverse the Earth's oceans, taking samples of oceanic life and sequencing using Whole Genome Shotgun Sequencing. From this adventure, they uncovered 6 million proteins (double the current database), which consisted of 1,700 clusters of gene families with no known homology. The data also revealed homology for 6,000 unknown ORF families (ORFan). They found that a very high proportion of new genes belonged to viruses (likely marine phage), which current databases had underrepresented. [2]
Global Ocean Explorer
Global Ocean Explorer

Examples of Bioprospecting using Metagenomics (Targeted Metagenomics)

A useful approach to Bioprospecting new genes involves either functional screening or pure sequence screening in what is called Targeted Metagenomics. This involves either challenging microbiota to a particular activity, or looking for specific families of genes. Both types of Targeted Metagenomic screens have led to new antibiotic resistance genes, cold-adaptive rRNA's, and cellulosic enzymes, to name just a few. [3].

----Typical Targeted Metagenomic Pipeline----
  1. Extract (DNA, RNA, or Protein) from Environmental Sample
  2. Next Gen Sequencing or Mass Spec
  3. Computational analysis for ORFs and homology searches
  4. Heterologous Expression and Testing for function
Cellulosic Biomass degrading genes found in Cow Rumen

Plant polysaccharides such as cellulose are not broken down with enyzmes found in mammals, but species such as Ruminents (cows) carry symbiotic bacteria that perform this job. These microbes cannot be cultured in lab. However, acquisition of the enzymes used to break down cellulose could be used to generate biofuel from easily grown plants like grass. Here, Mattias Hess and colleagues used Metagenomic analysis to identify 51 enzymes active in breaking down polysaccharides [4]. The authors isolated microbes from a nylon bag filled with switchgrass placed inside a fistula created into a cow's rumen. Various Next-Gen Sequencing technologies were used to generate 268 Giga-basepairs of sequences. From the various organisms, they predicted 27,755 putative polysaccharide enyzymes, of which 43% had less than 50% similarity to any known sequence. The authors expressed and tested 90 of these genes based on similarity to glycosyl hyrdolase domains, which identified 51 active enzymes. The enyzmes were active against various substrates used as biofuel crops. Finally, the authors generated 15 draft genomes of new microbial species. A number of earlier studies have also attempted to identify glycosyl hydrolases in species such as termites and panda [1][3]

Uranium Bioremediation

Bioremediation of uranium waste is an important industrial process. Here, the authors used a proteomics based approach to identify proteins in iron-reducing (FeIII) microbial species used in the reduction of soluble uranium to insoluble uranium [5]. Although the study here does identify new genes, it identifies new metabolic pathways. The authors started with 3 known Geobacter species stimulated with acetate and used in a bioremediation project and used the known proteins as a reference for performming 2D liquid chromatography tandem mass spectrometry. They identified over 13,000 peptides and 2,500 proteins. Not surprisingly, the authors show that acetate utilization increases via the TCA cycles (acetyl-CoA enzymes) presumably for increased growth as a fuel source. They also note high abundances of pyruvate ferrodoxin oxidoreductase, suggesting the bacteria are undergoing high carbon fixation processes.

Current status and the future

New genes found from Ruminant microbes are being used for generating fuel from biomass, proteins which have thus far been unknown to man. Genes identified in more extromophilic bacteria and archaea may be useful in metabolism of inorganic compounds. They may be useful in new genetic circuits in biotech applications. Finally, the genes will ultimately be useful as scaffolds for directed evolution studies, to generate new functions.

List of Parts found by Metagenomics

  1. glycosyl hydrolases (cellulose degradation)
  2. antibiotic resistance (tetracycline/bleomycin)
  3. extradiol dioxygenases (aromatic carbon usage)
  4. sulfate reductases
  5. deaminases
  6. Zn-dependent carboxypeptidases
  7. RNA-binding proteins
  8. many unique ORFans

For a review on methods of constructing metagenomic libraries to screen for useful genes as well as other useful genes isolated from the metagenome, see [6].

Limitations

Diversity

Often, the search for novel genes amongst divergent microbial species is limited to genes with small identical regions of DNA for cloning. Cloning of genes from a diverse sample can be helped with prior knowledge of the target gene family. Various domains of a given gene family can be targeted with degenerate primers to isolate similar genes in a microbial community sample. On the other hand, even the use of degenerate primers may not capture the entirety of gene families in a diverse sample. Alternatively, phylogenetic analysis of regions with high similarity (low degeneracy) can help for designing primers that better capture the genes.

Many of the current Next-Gen Sequencers are limited by short read lengths, which can prove problematic in the de novo construction of a genome. In addition, many protocols utilize emPCR after nebulization of DNA, which can introduce sequencing coverage bias. Nebulization randomly fragments sample genomic DNA into various lengths, which can be of varying GC content and secondary structure. Since emPCR amplifies single fragments, this can often lead to underrepresentation of difficult to PCR fragments. For low copy templates, this can often mean low sequence coverage in important areas.

The coming introduction of single-molecule long read sequencers, such as that by Pacific Biosciences may alleviate some of these limitations. Finally, it is unlikely that genes found in nature will cover all the uses humanity may come up with. Since nature settles for genes that function "well-enough", this may be inadequate since humans require efficiency.

Rare Genomes and Low Density Environments

Another issue is filtering out the interesting bacteria from the less interesting bacteria in a metagenome. When creating a library, it seems like it would be hard to maintain enough complexity to capture the plausibly rare and beneficial genes you're looking for. One solution to this might be to look into metagenomes where there is a huge selection pressure to have the genes you're looking for. Such an environment would have to be particularly hostile - something with very high concentrations of a given contaminant. While it doesn't allow you to capture the genomes of rare bacteria any better, it would increase the relative population of a bacteria with the physiology of interest. An interesting paper that talks about some of the evolutionary dynamics of such an environment has been recently published [7]. The situation does however create sort of a catch-22. Harsher conditions that lead to over representation of useful genes to cope with the environment also can have significantly lower overall concentrations of cells - on the order of 10,000/g of soil. By traditional library construction methods, this would require you to harvest dozens of kilograms of soil for extraction. A solution is to use φ29 DNA polymerase to amplify low concentrations of environmental DNA for library construction [8]. While this method has a lot of potential to give access to low concentration metagenomes, the innate biases it creates are not well understood.


References

  1. Chistoserdova L. . pmid:20495950. PubMed HubMed [BiotechLettReview2010]
    Recent progress and new challenges in metagenomics for biotechnology.

  2. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, Strausberg RL, Frazier M, and Venter JC. . pmid:17355171. PubMed HubMed [PLOSbio2007]
    The Sorcerer II Global Ocean Sampling expedition: expanding the universe of

    protein families.

  3. Suenaga H. . pmid:21366818. PubMed HubMed [EnviroMicroReview2011]
    Targeted metagenomics: a high-resolution metagenomics approach for specific gene

    clusters in complex microbial communities.

  4. Hess M, Sczyrba A, Egan R, Kim TW, Chokhawala H, Schroth G, Luo S, Clark DS, Chen F, Zhang T, Mackie RI, Pennacchio LA, Tringe SG, Visel A, Woyke T, Wang Z, and Rubin EM. . pmid:21273488. PubMed HubMed [Science2011]
    Metagenomic discovery of biomass-degrading genes and genomes from cow rumen.

  5. Wilkins MJ, Verberkmoes NC, Williams KH, Callister SJ, Mouser PJ, Elifantz H, N'guessan AL, Thomas BC, Nicora CD, Shah MB, Abraham P, Lipton MS, Lovley DR, Hettich RL, Long PE, and Banfield JF. . pmid:19717633. PubMed HubMed [ApplEnviroMicro2009]
    Proteogenomic monitoring of Geobacter physiology during stimulated uranium

    bioremediation.

  6. Daniel R. . pmid:15193327. PubMed HubMed [Daniel2004]
    The soil metagenome--a rich resource for the discovery of novel natural products.

  7. Hemme CL, Deng Y, Gentry TJ, Fields MW, Wu L, Barua S, Barry K, Tringe SG, Watson DB, He Z, Hazen TC, Tiedje JM, Rubin EM, and Zhou J. . pmid:20182523. PubMed HubMed [Hemme2010]
    Metagenomic insights into evolution of a heavy metal-contaminated groundwater

    microbial community.

  8. Abulencia CB, Wyborski DL, Garcia JA, Podar M, Chen W, Chang SH, Chang HW, Watson D, Brodie EL, Hazen TC, and Keller M. . pmid:16672469. PubMed HubMed [Abulencia2006]
    Environmental whole-genome amplification to access microbial populations in

    contaminated sediments.

All Medline abstracts: PubMed HubMed
Personal tools