IGEM:IMPERIAL/2007/Tutorials/Guide for Engineers/Central Dogma of Molecular Biology

Note: Images used are not copyrighted. A big thanks to Dr. Steve Cook (Imperial College) for permitting the use of his images and lecture notes. Other images obtained from Wikipedia.

Introduction

“The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be transferred back from protein to either protein or nucleic acid.” - Francis Crick, 1958

It might seem like a whole load of jargon, but the basic concept underlying the central dogma is that the flow of information proceeds in a linear step-wise fashion, from DNA to RNA and finally protein. This means that protein information cannot flow back to that of nucleic acids. The central dogma is a fundamental concept that basically forms the structural framework with which molecular biologists understand the transfer of sequence information between the different biopolymers.

So why is the central dogma important to synthetic biology? Many recombinant DNA technologies, as with our understanding of cellular processes, are more often than not based or adapted from what we know. Knowledge is power, and the more we understand, the more we are able to manipulate its machinery to meet our required specifications. And we do know a lot about the mechanisms behind it.

The Basics: Cell Organization and DNA Structure

The cell is a compartment (not necessarily the simplest form of a living system), and has the ability to localize concentration of metabolites, substrates, and catalysts. Cells are always bounded by an amphipathic lipid bilayer, usually made of phospholipids.

General similarities:

Proteins: L-amino acids (common chirality – D-amino acids are very rare – peptidoglycan).
Carbohydrates: glucose/glycolysis as main energy source, α-(1→4) glucans for energy storage, β-(1→4) glucans (cellulose, chitin) for structural purposes.
RNA: universality of the genetic code, all cells possess very similar ribosomes.
DNA: almost invariably the genetic material.
Lipids: terpenes and phospholipids used for membranes.
Porphyrins: haem, chlorophyll, bilin pigments, all with tetrapyrrole motif. Used for redox (electrons, oxygen, etc.).

Prokaryotes

Representative Graphic of a Prokaryotic cell

Prokaryotes have a simple cell morphology, and are characterized by the lack of a nucleus, leaving a free circular genophore.

Other properties:

‘Simple’ rotating motor-type flagellum.
Cell division by fission, genophore attached to plasmalemma by mesosome.
Few membrane-bound organelles, no double-membrane bound organelles.
They are ‘small’ because diffusion limits the rate of transport across the cell – 1 μm.

Prokaryotic DNA also lacks histones (packaging proteins) and molecules have direct access to DNA. They have a junk-free genome, transcribed by just one RNA polymerase. Transcription leads to translation with no intermediate processing, which allows several genes to be encoded in a single transcriptional unit (a polycistron). They have small (70S) ribosomes.

Eukaryotes

Representative Graphic of a Eukaryotic cell

Eukaryotes have a complex system of internal organelles (although not necessarily more complex by nature), and is characterized by the presence of a nucleus with its genophore enclosed within.

Other properties:

Double-membrane bound nucleus containing linear chromosomes.
Complex 9+2-type undulipodium, cytoskeleton, cytosis and mitosis.
Many membrane-bound organelles and double membrane-bound endosymbionts (e.g. mitochondria, chloroplasts).
They can grow ‘large’ because cytoplasmic streaming allows rapid transport across the cell – 100 μm.

Eukaryotic DNA is bound by histones, and requires some degree of unpacking for expression. Much of their genome is composed of parasitic DNA and introns. Three RNA polymerases exists, (approximately) one for each sort of major RNA product:

RNApol-I – rRNA.
RNApol-II – mRNA and snRNA.
RNApol-III – tRNA and 5S rRNA.

RNA is heavily processed in the nucleus (e.g. splicing mechanisms). They possess larger 80S ribosomes.

Structure of DNA

Nucleic acids may occur in double or single stranded forms, most typically:

double stranded DNA (dsDNA)
single stranded RNA (ssRNA)

Nucleic acids can also pair with themselves to form hairpin loops, cruciforms, internal loops and bulges. These are important in the termination of transcription and restriction enzymes (palindromic sequences pair readily with themselves). Secondary and tertiary structure of RNAs can give it catalytic activity.

http://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/DNA_chemical_structure.svg/350px-DNA_chemical_structure.svg.png

DNA consists of two nucleotide polymers that form a right-handed double helix, with a backbone made of ribose sugar and phosphate groups joined by phosphodiester bonds. Attached to each ribose is one of four bases - adenine (A), thymine (T), cytosine (C), and guanine (G). Essentially the sequence of these four bases form that basis of the genetic code, where information is passed along the central dogma, as to the next generation. At this point we should also note that RNA is encoded by uracil (U) as opposed to thymine.

Within cells, DNA is organized into structures called chromosomes, and a set of chromosomes within the cell make up the genome.

For prokaryotes, this is largely a simple loop of dsDNA, although some viral and other parasitic ssDNA, ssRNA, dsRNA, etc., may be present. The majority of the genome is present on a single chromosome - in Escherichia coli this is about 5 megabases (Mb) in size, and supercoiled.

For eukaryotes, the genome is found in the nucleus, where chromatin proteins like histones bind to it to compact it further, and also control its interactions with other proteins and therefore transcription rates.

In addition to genomic DNA, extrachromosomal DNA are also present in the form of other organelles like mitochondria and chloroplasts. Other examples include plasmids, an important vector in recombinant DNA technology, and bacterial artificial chromosomes (BACs) and yeast artificial chromosomes (YACs).

Gene Organization

The way that genes are organized in Prokaryotes are also significantly different from that of Eukaryotes. In Prokaryotes, genes are usually organized in operons, where a single operon is responsible for the synthesis and regulation of proteins that are associated with its function. A polycistron (cistron being the region where mRNA is translated) is also formed which eventually translates to the various proteins within the operon.

In contrast, Eukaryotes often have genes in discrete locations, and its mRNA is monocistronic. Due to the difference in gene organization, its genes are also regulated in different ways. Refer to the lac operon section for more details.

DNA Replication

DNA replication is the process of copying double-stranded DNA molecules.

What must replication achieve?

DNA is a helix.
- Must be unwound and ssDNA must be stopped from annealing to inappropriately.

DNA is a very long, twisted molecule.
- Must avoid tangling.

DNA is information.
- Must be copied with high fidelity.

DNA is a double helix.
- Must synthesise two strands simultaneously in 5ʹ→3ʹ and 3ʹ→5ʹ directions.

This is achieved by various proteins as will be explained later. Two important concepts are associated with DNA replication:

Semi-conservative replication
Proofreading mechanisms

Semi-conservative Replication

http://upload.wikimedia.org/wikipedia/en/a/a2/DNAreplicationModes.png
Theoretically, the copying of DNA can be achieved via several means. Semi-conservative replication is the only known method to be used in Nature.

Origin of Replication

The origin of replication is a length of DNA into which proteins can insert and form a ‘replication bubble’ by loading in DNA polymerase. Two replication forks migrate from the bubble outwards, synthesising DNA as they go.

Prokaryotic chromosomes only have one origin of replication (ori), and its entire genome (5 Mbs) are copied in about 40 min, which means a speed of ≈ 1000 bp s−1. Its rate of growth is further increased in complete media.

From this on we would omit anything that is to do with eukaryotes unless otherwise stated.

Enymes involved

As stated above, there are several criteria for copying DNA so as to achieve fecundity, fidelity and longevity. This is achieved by various proteins that alter its shape (topology). The melting point of strands is c. 90°C in vitro (see PCR); this must be made to happen at c. 37°C in vivo.

Initiator Proteins

Initiator proteins involved in 'melting' of DNA helix at ori to allow loading of helicase.

Helicase

Helicase involved in energy-dependent separation of DNA strands by screwing mechanism.

Single stranded binding proteins (SSBPs)

SSBPs stabilise the parted strands, which would otherwise anneal to themselves (or other strands), forming hairpin loops and other unwanted tangles.

Topoisomerase I

Topoisomerases stop tangles. Topoisomerase I nicks the strand ahead of helicase, allowing strain-relief, and preventing snarling up of the strands as they are forced apart.

Topoisomerase II

Prokaryotic chromosomes are Möbius strips. Topoisomerase II creates gates, allowing helices to cross one another. Tangles and linked loops can therefore be separated.

DNA Polymerase

DNA polymerase is held onto the DNA by a doughnut shaped sliding clamp (processivity factor). This is loaded onto the DNA by a clamp loader. DNA is synthesised from dNTPs. Hydrolysis of (two) phosphate bonds in dNTP drives this reduction in entropy. This however only occurs in the 5ʹ→3ʹ direction. Prokaryotes possess three polymerases:

- DNApol I – repair
- DNApol II – cleans up Okazaki fragments
- DNApol III – main polymerase\

DNA Primase

DNApol will not synthesise dsDNA from ssDNA: it needs a short double stranded section. RNA primers are therefore added by DNA primase. The primase plus the helicase are often called the lagging strand primosome.

Priming is the most likely point for mismatch to occur because the short helix is unstable, so DNApol has been evolutionarily tuned to refuse to bind to ssDNA, and will only bind to dsDNA or DNA/RNA hybrid. RNA is distinguishable from DNA and can be removed later, whereas a DNA primer would not allow this erasibility. Hence RNA is used.

DNA Ligase

DNA ligase combine strands of DNA together by linking the phosphate groups on the ribose sugar to form a phosphodiester bond.

The Replicosome

http://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/DNA_replication.svg/691px-DNA_replication.svg.png

The leading strand is synthesised continuously 5ʹ→3ʹ. However, the other, ‘lagging’ strand is still synthesised 5ʹ→3ʹ but in discrete chunks, from the replication fork back towards the origin.

Okazaki Fragments

DNApol produces ‘Okazaki’ fragments on the lagging strand.

DNA primase adds 10 bp RNA primer to DNA.
DNApol extends the primer to 200 bp (eukaryotes – nucleosome size?) or 1000 bp (prokaryotes) with DNA.

DNApol drops off when it hits a primer on lagging strand. The RNAse-H complex removes the primer and DNApol fills in the hole. Finally, DNA ligase joins the fragments together. This palaver also needs to be done when DNApol completes the full-circle in prokaryotes.

Proofreading Mechanisms

Binding of nucleotides has error of c. 10⁻⁴, due to extremely short-lived imino and enol tautomery. However, the lesion rate in DNA is only 10⁻⁹. This increased accuracy is due to the fact that DNApol can chew back mismatched pairs to a clean 3ʹ end using its built-in 3ʹ→5ʹ exonuclease activity.

Imino-cytosine pairs (incorrectly, as far as the cell is concerned) with adenine. Nanoseconds later, the iC converts back to the normal amino form of cytosine, and no longer pairs with adenine. The mismatched base sticks out and is cleaved off by the exonuclease activity of the enzyme, leaving a fresh OH group to try again.

Transcription

t is the process of copying DNA to RNA. It differs from DNA synthesis in that only one strand of DNA, the template strand, is used to make mRNA. Unlike DNA replication, it does not need a primer to start. It can involve multiple RNA polymerases.
Transcription is divided into 3 stages

Initiation
Elongation
Termination

Initiation

RNA polymerases execute specificity by having a sigma factor, and also an α subunit that recognises a specific region of DNA upstream of the promoter.

Elongation

Termination

Two types of termination exists, a rho independent type and a rho dependant type. The rho independent type that relies on the weak base pairing that exists between the long continuous chain of A-U base pairing, which results in a particularly flexible portion of the DNA. This can cause the RNA polymerase to dissociate from the DNA and completed mRNA by formation of a stem loop that disrupts the smooth processing of the RNAP along the DNA.
The other type of termination, the rho dependant type, involves a rho protein that yanks the DNA and RNAP apart when they are bound too tightly together and cannot dissociate.

Final Product

The final product of transcription of a prokaryote gene includes a portion for gene translation, as well as regions at the beginning and the end of the gene that does not get translated.

Transcriptional Control

There are different promoters for different sigma factors. A sigma factor has to bind to a promoter as well as a RNAP to enable the RNAP to start transcribing the gene. It provides specificity for the RNAP as an RNAP is able to bind to any promoter sequence, and thus transcribe any gene. However, as only certain genes need to be transcribed, sigma factors are produced by the cells specific for the promoters of the transcribed genes.

Case Study: Lac Operon

For control of lactose metabolism
Consists of three structural genes, a promoter, a terminator and an operator
LacZ codes for a lactose cleavage enzyme
LacY codes for ß-galactosidase permease
LacA codes for thiogalactoside transcyclase
When lactose is unavailable as a carbon source, the lac operon is not transcribed.

The regulatory response requires the lactose repressor. The lacI gene encoding repressor lies nearby the lac operon and it is consitutively (i.e. always) expressed. In the absence of lactose, the repressor binds very tightly to a short DNA sequence just downstream of the promoter near the beginning of lacZ called the lac operator. Repressor bound to the operator interferes with binding of RNAP to the promoter, and therefore mRNA encoding LacZ and LacY is only made at very low levels. In the presence of lactose, a lactose metabolite called allolactose binds to the repressor, causing a change in its shape. The repressor is unable to bind to the operator, allowing RNAP to transcribe the lac genes and thereby leading to high levels of the encoded proteins. More details can be found here: [[1]]

Translation

Interpreting the information coded in the mRNA into proteins.
The nucleotides are read in triplets (set of three) called codons.
Each triplet code for a specific amino acid, and sometimes more than one codon exist for an amino acid.
mRNA are read by the translational machinery including ribosomes, tRNAs and rRNAs.
Like transcription, it also includes initiation, elongation and termination.

Codon Table

It is important to take note of the STOP codons, which are UAA, UGA, UAG; as well as the start codon AUG which codes for methione. In prokaryotes, the first AUG always codes for formyl methione; while in eukaryotes, the first AUG codes for methione, like all the other AUGs in the rest of the mRNA.

Ribosome

The ribosome is the main machinery for translation. It consists of two subunits, the larger one and the smaller one. It is made of ribosomal RNAs (rRNAs) and proteins. It also has three sites, the A (amino) site, the P (peptidyl) site, and the E (exit) site.

tRNA the Middle Man

It is in a clover shaped structure.
It serves the function of bringing the amino acids to the mRNA.
It has an anticodon loop to recognise the codons in the mRNA (by Watson-Crick base pairing).
It is responsible for the specificity of the codon recognition.

tRNA Charging

Aminoacylation is the process of adding an aminoacyl group to a compound.
tRNA is aminoacylated(or charged) with a specific amino acid by an aminoacyl tRNA synthase.
There is normally a single aminoacyl tRNA synthetase for each amino acid, despite the fact that there can be more than one tRNA, and more than one anticodon, for an amino acid.