The big picture
- We can now synthesize relatively long stretches of DNA
- This makes it possible to consider rebuilding existing (small) chromosomes, or building new chromosomes, from scratch. For example, chromosomes I, III and VI in S. cerevisiae are < 350kb long, and are candidates for being rebuilt
This leads to the $64,000 question: If you were to rebuild a yeast chromosome, what changes would you make to it ? or If you were to build a new yeast chromosome, what would you put on it ?
If you're interested in understanding large-scale chromosome structure and its effects, there are (at least) a couple of different overarching goals that can drive the answer to this question:
- The Science goal: Investigate how chromosomes are currently organized, and the importance of various elements of their organization.
- The Engineering goal: Investigate how to build a chromosome with a particular set of capabilities independent of the actual genes on the chromosome, like a low overall recombination rate.
At the moment, I'm leaning towards having an engineering goal.
Chromosome/genome organization in S.cerevisiae
- The Saccharomyces Genome Database is the fount of all knowledge.
List of genomic elements
Essential elements of linear chromosomes (to avoid chromosome loss)
- Centromeres, origins of replication, telomeres
- Murray and Szostak, '83: Established some parameters for linear artificial chromosomes:
- Chromosomes 7-15 kb in length are mitotically unstable (even when they have centromeres, origins of replication and telomeres) and are maintained at high copy number (15-50, for plasmids they studied)
- Length of 55kb is enough for:
- Low rate of mitotic loss (but still about two orders of magnitude more than natural chromosomes)
- Low copy number (1-2)
- Similar meiotic segregation behavior as natural chromosomes
- However, see also Wada et al, '94: Paper apparently describes a yeast mutant in which a 10kb linear YAC is stably maintained.
Gene order and distribution
- Overall: gene order and distribution isn't random. Good overview paper: Pal and Hurst, '04
- Genes involved in the same metabolic pathway (as defined by KEGG) tend to "cluster" on chromosomes, where "cluster" means "large region of chromosome with high concentration of pathway members, although non-members may also be present". 20% of metabolic pathways in S.cerevisiae exhibit this kind of clustering, after controlling for tandem duplicates (10% show clustering in random data; percentage for S.cerevisiae is lowest number of all organisms analyzed). (Lee and Sonnhammer, '03; see also erratum correcting major error for S.cerevisiae data.)
- Genes that are controlled by the same sequence-specific transcription factor tend to be regularly spaced along chromosome arms. Different periods are observed for different chromosome arms. Regularities are consistent with a genome-wide loop model of chromosomes, in which co-regulated genes dynamically co-localize in 3D. (Kepes, '03)
- Adjacent pairs of genes show correlated expression independent of their origin. Correlated triplets, but not quadruplets, were also found more often than expected by chance. Correlation maps also revealed regularly-spaced groups of correlated genes along chromosomes that might be indicative of higher-order chromosome structure. (Cohen et al, '00)
- Statistically significant fraction of genes coding for subunits of stable complexes are located within 10-30kb of each other. This clustering may ensure better coregulation and maintain the right stoichiometry of complexes upon duplication of chromosomal segments (Teichmann and Veitia, '04)
- Gene orientation (ie whether they’re on the plus or minus strand) can be modeled by a first-order Markov model ie the orientation of a gene depends on the orientation of the gene that precedes it. (Note: Transition probabilities for yeast are pretty close to 0.5 ie close to a random coin-flipping model, but the authors claim that the coin-flip model is statistically improbable; I can’t really judge their statistics, but I still don’t put much trust into this model.) (Simons and Morton, '03)
- Essential genes in yeast are clustered, independent of co-expression and tandem duplication. Clusters of essential genes are in regions of low recombination and larger clusters have lower recombination rates. (Pal and Hurst, '03)
- There is negative correlation between chromosome length and G+C content at (silent) third codon positions (GC3s) of ORFS. Chromosome III is abnormal in that it has strong clustering of GC3s; could be because it contains mating-type loci, so there’s selective pressure to keep mating-type switching an intrachromosomal reaction and thus to keep most of the chromosome (between HML and HMR) intact, leading to less structural disruption than other chromosomes (which preserves existing clusters ?) (Bradnam et al, '99)
- Efficiency of DNA mismatch repair of frameshift mutations in microsatellite repeats varies depending on genomic position of microsatellite repeat. There doesn't seem to be any correlation between repair efficiency and position with respect to replication origin, replication timing, or G:C content of nearby sequence. Authors suggest that context-dependence of repair efficiency reflects some aspect of chromatin structure. (Hawk et al, '05)
- Not clear how applicable their findings are to non-repeating sequence, like protein-coding regions.
- Might be interesting to take the map of chromatin structure produced by Pokholok et al, '05 and see whether there is correlation between chromatin structure and efficiency of mismatch repair, as proposed.
- There are hot- and coldspots of meiotic recombination in S.cerevisiae. Each chromosome has hotspots & coldspots; hotspots tend to cluster around regions with high G+C content whereas coldspots are nonrandomly associated with centromeres and telomeres. Hotspots are also enriched near genes involved in metabolic pathways and ionic homeostasis; coldspots were over-represented near ORFs involved in transport facilitation and intracellular transport. Some types of hotspots require transcription factor binding in order to become active. Hotspots tend to be in intergenic regions. (Gerton et al, '00)
Transcription factor binding sites
- Paper with lots o' data: Harbison et al, '04
- Lots of high-scoring transcription factor binding sites in ORFs, some of which are actually bound to in vivo (but with lower average binding strength than sites in intergenic regions). (My 7.90 class project)
- Survey paper: MacAlpine and Bell, '05
- Autonomous replication sequences (ARS), are about 200 bp long and contain an ARS consensus sequence (ACS) that's ~11bp long. Sequence flanking the ACS is essential, but there are no obvious sequence similarities between flanking sequences in different ARSs.
- There are 200-400 ARSs in the yeast genome (ie they occur every 30-40 kbp), but not all function as origins of replication.
- For given cell type, under given growth condition, each part of the genome replicates at a characteristic time within S phase.
- Activation timing of each origin is related to its chromosomal position; origins near centromeres are activated earlier, origins near telomeres are activated later than other origins.
- No correlation between steady-state transcription level of a gene and establishment/activation of an origin near the gene.
- Pre-RC (complex that assembles at origins before replication) primarily assembles at pro-ARSs (ie possible origins of replication) in intergenic regions. However, significantly fewer pro-ARSs occur in intergenic sequences flanked by diverging transcripts than would be expected.
- Bottom line:
- Still can't predict which ARSs will actually function as origins of replication, and the timing of their activation.
- Mechanism responsible for establishing the conserved, characteristic pattern of replication across the genome is still unknown.
- For Pol II promoters on chromosome III: region ~200bp upstream of start codons is typically nucleosome-free, flanked by "well-positioned" nucleosomes (ie regularly-spaced nucleosomes whose position varied relatively little). Nucleosome-free sequences are evolutionarily conserved and enriched in poly-A and poly-T sequences. Most occupied transcription factor motifs weren't associated with nucleosomes, suggesting that nucleosome positioning affects transcription factor access. (Yuan et al, '05)
- There's higher nucleosome density over transcribed regions than intergenic regions. More highly-transcribed genes have lower nucleosome occupancy than less highly-transcribed genes. Gene activation leads to reduced nucleosome density in both promoter and transcribed regions, with greatest effect occurring at the promoter. (Pokholok et al, '05)
- Promoter regions and transcriptional start sites of active genes are enriched in acetylated histones. Active genes are also enriched for methylated histones, both at their beginning and further downstream. (Pokholok et al, '05)
Some chromatin background
- Histone variants: H2A.z is associated with transcribed regions; inhibits repressive chromatin structures. CENP-A histone variant is associated with nucleosomes that include centromeric DNA.
- Nucleosome remodeling complexes can be targeted to DNA by interaction with DNA-bound transcription factors. Alternatively, binding of some TF to DNA is incompatible with association of the same DNA with a histone octamer. Since nucleosomes require >147bp of DNA to form, if 2 such TF bind < 147bp apart, the DNA between them can't assemble into a nucleosome.
- Nucleosomes assemble preferentially on A:T-rich DNA when minor groove faces the histone octamer, G:C-rich DNA when major groove faces octamer. Sequences that alternative between A:T and G:C-rich sequences with periodicity of ~5bp act as preferred nucleosome binding sites.
- Modification of N-terminal tails of histones alters chromatin accessibility. Acetylated nucleosomes are typically associated with transcriptionally active nucleosomes, deacetylated nucleosomes with transcriptionally inactive chromatin. Methylation can have either effect, depending on particular amino acid that is methylated. There are no known demethylases.
- Proteins with bromodomains interact with acetylated histone tails, proteins with chromodomains with methylated histone tails. Bromo/chromodomain-containing proteins are often associated with acetylases/methylases and can thus participate in a positive feedback loop.
- During DNA replication, H3:H4 tetramers are either transferred wholesale to the new strand or retained on the old strand. H2A:H2B dimers are released into soluble pool and then reassociate with the old and new strands. Nucleosome assembly requires chaperones.
- Remove all “inert” DNA ie non-coding, not promoters etc; see whether yeast is still alive.
- If it’s not alive, tracking down the pieces required to make it viable would be lots of work. (Drew has this idea about "Hamming genetics", to make it easier to track down this sort of thing, that I don't quite grok.)
- Remove all introns
- Not sure what this would really tell us. Would it make yeast easier to manipulate ?
- Change codon usage to be “optimal” & see whether fitness (by some measure of fitness) improves
- Problem is that you’d (probably) have to do this across all chromosomes, not just a single chromosome, in order to see an effect on fitness
- Remove all ORFs of unknown/duplicated function, see whether yeast is alive or not.
- If it’s dead, tracking down which of the removed ORFs are responsible would be painful
- Put all genes in pheromone response pathway on their own chromosome & remove the endogenous copies, to test current model of pathway
- Problem is that there are >50 genes involved in the pathway and removing all the endogenous copies would be a huge amount of work
- Engineer photosynthesis into yeasts.
- Not clear what the point of doing so would be, other than “because we can”.
- Explore yeast mating-type switching, since all genes involved are on chromosome III
- Not appealing because lots of experiments with chromosome architecture and location of the loci have already been done; also, behavior doesn’t seem to be sequence-specific, with exception of the RE element. See also Galgoczy et al, which seems like a pretty thorough, low-level dissection of mating-type switching.
- Build rearranged/jumbled chromosome: preserve functional elements (eg gene + associated promoter), but change gene order, orientation & strand. Profile gene expression, histone location etc & compare to WT.
- Would be very difficult to say anything specific if you really jumble up the functional elements randomly.
- There is an algorithm for calculating minimal sequence of inversions, translocations etc needed to transform one permutation (ie ordering) of genes into another, by Pavel Pevzner's group at UCSD. Extending this algorithm to take into account practical issues that would arise when trying to rearrange a chromosome, like having to make sure that you don't remove any essential genes during a rearrangement step, might be an interesting engineering problem.
- Disrupt all occurrences of TF binding sequences that occur in coding sequence [by disrupting the motif but keeping the same amino acid sequence] and then profile gene expression patterns. Would help to unravel whether in-gene binding sites are biologically relevant, eg by acting as “titrating” sites (along TK’s theory).
- Specifically: pick a transcription factor that has a well-known, unique overexpression phenotype and disrupt all of its intragenic binding sites. If these binding sites acted to titrate the TF away from the “real” binding sites, then you should see the same phenotype as when the TF is overexpressed
- Take a metabolic pathway that’s clustered, disrupt the clustering and see whether the efficiency of the pathway is impaired. If it is, why ?
- Rebuild chromosome by moving promoters + ORFs associated with recombination hotspots around, re-profile recombination hotspots and see whether they’ve moved with the promoters/ORFs.
Design chromosome that:
- Has custom chromatin structure eg
- doesn’t have any closed regions of chromatin
- has chromatin structure that varies with, say, cell cycle
- has uniform chromatin structure, so that differences in gene expression are determined only by promoter sequence & levels of TF
- has nucleosomes made up only of custom histones that don’t respond in a standard way to the usual acetylation/methylation events, or have a custom histone code
- Is resistant to disruption by Ty1: try to design “Super Ty1” transposon (similar to what Han and Boeke did with human LINE1 transposons) that disrupts WT chromosomes a lot and then design chromosomes that are resistant to being invaded by this Super Ty1.
- Undergoes meiotic recombination only rarely
- Gets replicated very quickly; for example, this could be useful to keep concentrations of proteins being expressed off that chromosome relatively steady during S phase
- Engineering evolution to study speciation in yeasts: The authors rearranged the S.cerevisiae genome to be collinear with that of S.mikatae in order to allow them to study the constraints on mating between these two yeast species.