CH391L/S12/GeneandGenomeSynthesis

From OpenWetWare
Revision as of 09:30, 6 February 2012 by Joe Hanson (talk | contribs)
Jump to navigationJump to search

Gene and Genome Synthesis

Introduction

Gene synthesis, or artificial gene synthesis, refers to the process of creating a nucleic acid template for a gene in vitro, without the requirement of a preexisting DNA template. Soon after the elucidation of the genetic code and the description of the central dogma of molecular biology, there arose a need to synthesize genes de novo in order to study their biological function both in the test tube and in model organisms. Chemical synthesis of DNA has grown from an expensive and time-consuming process into a viable commercial industry capable of high-throughput manufacture of almost any scale of custom DNA molecules in almost any context. This allows species-specific gene optimization, creation of genes from rare or dangerous sources, and combinatorial assembly of any DNA sequence that can be chemically synthesized, even including non-traditional bases. The most advanced applications of gene synthesis have been applied to the recent creation of completely synthetic minimal genomes in prokaryotes.

Despite nearly four decades of progress in gene synthesis technologies, most DNA sequences used in modern molecular biology are assembled in part or in whole from naturally occurring templates. However this limits the scope and applications to previously existing genes and the results of large-scale genomic surveys of novel genes from nature. Modern gene synthesis relies heavily on advancements in chemical DNA oligonucleotide synthesis, with the primary challenges being scale, cost, fidelity and the eventual assembly of complete gene products.

A directory of commercial gene synthesis providers can be found at Genespace.

History of Gene Synthesis

Gene synthesis predates the invention of restriction enzymes and molecular cloning techniques by several years. The first gene to be completely synthesized in vitro was a 77-nt alanine transfer RNA by the laboratory of Har Gobind Khorana in 1972. This was the result of nearly five years of work and resulted in a DNA template without promoter or transcriptional control sequences. The first peptide- and protein-producing synthetic genes were created in 1977 and 1979, respectively (Itakura 1977, Goeddell 1979). Steady advancement has led to recent synthesis of complete gene clusters tens of thousands of nucleotides in length, and ultimately a bacterial genome approximately 1.2 million bases in length.

Oligonucleotide Synthesis

Regardless of the length of the eventual product, synthetic DNA constructs are built from some combination of short DNA oligonucleotides. These oligonucleotides are later assembled into a complementary DNA duplex, amplified and inserted into their final genetic context.

Oligonucleotides are chemically synthesized from DNA phosphoramidite monomers. Briefly, activated phosphoramidite monomers are added in the 3' to 5' direction using a cyclical activation and blocking chemistry to obtain a DNA polymer linked by phosphodiester bonds.

Chemical synthesis is currently limited to oligonucleotides of about 200 nt in length.

Summary of Methods

Ligation-Based Methods

In ligation-mediated assembly, the synthetic gene or DNA fragment of interest is broken up into individual overlapping DNA oligonucleotides that cover both strands of the eventual DNA duplex. In contrast to PCR-based methods of gene synthesis, there are no gaps introduced in either strand during design, but rather the oligonucleotides completely reconstitute the eventual DNA target.

The oligonucleotides are chemically synthesized and phosphorylated at their 5' ends using purified kinase in an in vitro reaction. After complementary oligo fragments are annealed using using thermal cycling, purified DNA ligase is added to the reaction in order to splice together the 3' OH and 5' phosphate gaps on each backbone.

PCR-based Methods

PCR-based gene assembly strategies are similar to ligation-mediated assembly, except that oligonucleotide precursors are linked together leaving gaps that must be filled in by DNA polymerase. In the most common method, polymerase cycling assembly (PCA), DNA oligonucleotides are designed to be part of either the top or bottom of the final DNA duplex. Cycles of annealing and polymerase extension result in a growing DNA duplex built from smaller oligonucleotide fragments. A final round of PCR is done with constant primers that amplify the complete desired gene product. Initial oligo linkage is done for 20-30 cycles and full-length PCR is carried out for 20-30 additional cycles.

Oligos must be designed carefully to have similar melting temperatures and be free of interfering secondary structures (high GC content, hairpins, etc.)

Solid-Phase Synthesis

An alternative gene synthesis technique using solid-phase synthesis of a DNA duplex has been created by several companies, including Blue Heron. In these methods, DNA oligonucleotides are synthesized as in other techniques, they are annealed into complementary short duplexes without overhanging ends, and these duplexes are chemically ligated on a solid phase such as a liquid chromatography column. The complete synthetic DNA duplex is then eluted and used for downstream processes, such as cloning or gene expression.

Multiplex/Microchip Synthesis

DNA oligonucleotides have become a commodity and the price per base has been steadily dropping. But there is still great expense involved in large-scale gene synthesis. For a 3-kb gene fragment, oligo cost alone can still exceed $1,000. To over come this hurdle, multiplex parallel gene synthesis technologies are under development, including inkjet DNA printers, photolithography and electrochemical parallel synthesis.

Microfluidic microchips are a common multiplexing solution. In the method of Tian et al. (2004), oligonucleotides are synthesized while coupled to a microchip at their 3' end. The overall process is similar to traditional oligo synthesis, except that individual wells are photoactivated before addition of the next base onto the 5' end. After chemical cleavage of the complete oligo from the microchip, they are hybridized and purified using complementary "quality assurance" oligos bound to magnetic beads. This method eliminates oligos that have either incorporated extra bases or undergone deletions. Final yield from each synthesis well is only around 5 fmol of oligo, which is not sufficient for downstream applications. A final amplification step using common PCR primers to all oligos increases yields up to a million-fold. Oligos can then be assembled by overlap PCR-based methods such as polymerase assembly multiplexing, or PAM. This method has resulted in the synthesis, amplification and expression of a 14.5 kb, 21-gene 30S ribosomal protein operon from E. coli.

Minimizing Errors

One major roadblock remaining in synthetic gene synthesis is overcoming natural error rates in DNA synthesis chemistry. Phosphoramidite oligo synthesis currently has an error rate of 1 in ~160. This results in a high enough error rate in assembled genes that screening and sequencing are major cost bottlenecks, as is the repair or re-synthesis of mutated genes. While the development of cheap and high-throughput sequencing technologies will streamline the screening process, oligo synthesis error rates must also be reduced.

Currently, oligo pools are often screened against complementary reference libraries that can be reused. Oligos that do not completely base-pair with their designed sequence are either washed away in purification steps or are cleaved by DNA mismatch repair proteins due to their imperfect base-pairing. This has currently decreased error rates to 1 in 106, on the order of error rates in DNA polymerases.

Genome Synthesis

Gibson Assembly

Final assembly of large double-stranded DNA products into kilobase and megabase functional molecules free of errors and with minimal cost remains a process bottleneck for gene and genome synthesis. In vitro recombination methods have been successfully used to assemble large DNA molecules on the order of tens of kilobases, but this method remains prone to errors and requires considerable homology be engineered into assembly fragments.

In 2009, Daniel Gibson of J. Craig Venter's group reported a method for assembling hundreds of kilobases of DNA sequence using an isothermal enzymatic setup known today as "Gibson Assembly". Sub-fragments of the eventual assembly product are obtained as blunt-ended double-stranded DNA molecules several kilobases in length. They share several hundred bases of homology at their adjacent termini. T5 DNA Exonuclease chews back the double-stranded DNA molecule to reveal 3' single-stranded overhangs. Being heat-liable, T5 exonuclease is inactivated after several minutes at the reaction temperature. A mixture of thermostable ligase and DNA polymerase repair and assemble the individual fragments into larger molecules. This method has been used to assemble circular DNA molecules up to 900 kilobases in length. The size limit of molecules that can be assembled in this manner is currently limited to the cost of design and screening the final assembly products and the hundreds of kilobases of DNA that must be sequenced.