Xwang:Accomplishment: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
<table width="790" border="0" cellpadding="0" cellspacing="0" bgcolor="#FFFFFF">
<div style="width: 790px">


<table width="790" height="4938" border="0" cellpadding="0" cellspacing="0">
== Milestones in My Bioinformatics and Plant genomics Research ==
        <tr>
          <td colspan="2" class="pageName"><h3 align="center"><em>Milestones in My Bioinformatics and Plant genomics Research </em></h3></td>
</tr>


<tr>
<div> '''2008 ~ 2010 (Harvard University, Dana-Farber Cancer Institute)''' </div>  
  <td height="21" colspan="2" bgcolor="#CC0000" class="bodyText">&nbsp;</td>
<div> ''' Computational models for cataloging smRNAs (smRNA-Seq) and RNA-mediated Transcriptional Gene Silencing ''' </div>
    </tr>
 
<tr>
Generally, the ncRNAs that have regulatory function can be primarily classified based on sizes: long ncRNAs (>40 nt) and small ncRNAs (20~25 nt, smRNAs). Long ncRNAs usually possess miRNA-like signatures and influence specific genes localized in antisense or nearby, while smRNAs are involved in a wide spectrum of pathways and function either in trans- or cis-mode. In plant, endogenous smRNAs refer to microRNAs (miRNAs) and short-interfering RNAs (siRNA) that are produced by distinct pathways, and the latter has multiple subclasses. Two of the subclasses named trans-acting siRNAs (ta-siRNAs) and natural-antisense-transcript derived siRNAs (nat-siRNAs), arising from respective TAS genes and convergent gene pairs, will be incorporated in RISC (RNA-induced silencing complex) to mediate translational repression or mRNA degradation, while the rest of siRNA classes mostly derived from tandem repeats or transposable elements (TEs) are found in RITS (RNA-induced initiation of transcriptional silencing, [yeast]) or RdDM (RNA-directed DNA methylation, [Arabidopsis]) complex to mediate epigenetic transformations by targeting on nascent RNAs amid transcription. However, the function and biogenesis pathways of a great number of siRNA classes are poorly characterized.
  <td height="347" colspan="2" class="bodyText"><p><strong>2008 ~ Present (Harvard University, Dana-Farber Cancer Institute)</strong></p>
 
      <p><strong>Computational models for cataloging smRNAs (smRNA-Seq) and RNA-mediated Transcriptional Gene Silencing</strong></p>
 
      <p align="left">Generally, the ncRNAs that have regulatory function can be primarily classified based on sizes: long ncRNAs (&gt;40 nt) and small ncRNAs (20~25 nt, smRNAs).  Long ncRNAs usually possess miRNA-like signatures and influence specific genes localized in antisense or nearby, while smRNAs are involved in a wide spectrum of pathways and function either in <em>trans-</em> or <em>cis</em>-mode.  In plant, endogenous smRNAs refer to microRNAs (miRNAs) and short-interfering RNAs (siRNA) that are produced by distinct pathways, and the latter has multiple subclasses.  Two of the subclasses named trans-acting siRNAs (ta-siRNAs) and natural-antisense-transcript derived siRNAs (nat-siRNAs), arising from respective <em>TAS</em> genes and convergent gene pairs, will be incorporated in RISC (RNA-induced silencing complex) to mediate translational repression or mRNA degradation, while the rest of siRNA classes mostly derived from tandem repeats or transposable elements (TEs) are found in RITS (RNA-induced initiation of transcriptional silencing, [yeast]) or RdDM (RNA-directed DNA methylation, [<em>Arabidopsis</em>]) complex to mediate epigenetic transformations by targeting on nascent RNAs amid transcription.  However, the function and biogenesis pathways of a great number of siRNA classes are poorly characterized. </p>
 
      <p align="left">I am currently working on developing computational models to systematically catalog the smRNA classes from high-throughput sequencing data, and at the same time trying to characterize the regulatory roles of smRNAs  by a integrative analysis at all epigenetic layers, the current progress will be updated in &quot;Project&quot; item on left menu.</p></td>
 
    </tr>
 
<tr>
 
  <td height="21" colspan="2" class="bodyText">&nbsp;</td>
 
    </tr>
 
<tr>
 
  <td height="21" colspan="2" bgcolor="#CC0000" class="bodyText">&nbsp;</td>
 
    </tr>
 
<tr>
 
          <td height="347" colspan="2" class="bodyText"><p><strong>2008 ~ 2009</strong> <strong>(Harvard University and Yale University)</strong></p>
 
            <p><strong>Epigenome, transcriptome and smRNAome by high-throughput Solexa seqeuncing (ChIP-Seq, RNA-Seq)</strong></p>
 
          <p align="left"> Shirley's Lab and Deng  Lab collaborated the Illumia/Solexa sequencing to generate four histone  modifications, H3K4/K36/K27 tri-methylation, H3K9 acetylation, plus DNA  methylation, mRNA and small RNA data in maize, rice and Arabidopsis. This  hugely integrated epigenomic data not only provides an opportunity for us to  interpret the relation between epigenetic marks and gene transcription and small  RNAs, but also gives us a chance to compare between species (<strong><em>Wang et al., 2009, The Plant Cell</em></strong>).<img src="http://cals.arizona.edu/research/xwang/Research/maize-seq.jpg" width="985" height="358" /></p>          </td>
 
        </tr>
</div>
<tr>
  <td height="21" colspan="2" bgcolor="#CC0000" class="bodyText">&nbsp;</td>
    </tr>
<tr>
  <td height="351" colspan="2" class="bodyText"><p align="left"><strong>2007 ~ 2008 (Yale University)</strong></p>
    <p align="left"><strong>Mapping of H3K4me2, H3K4me2, H3K9me2, H3K27me3 and DNA methylation in rice and Arabidopsis (ChIP-Chip)</strong></p>
    <p align="left">This analysis reveals combinatorial interactions between these epigenetic modifications and chromatin structure and gene expression, and we have found several interesting rules regarding those tested epigenetic marks.<br />
      1. Cytologically densely stained heterochromatin had less H3K4me2 and H3K4me3 and more methylated DNA than the less densely stained euchromatin, whereas centromeres had a unique epigenetic composition. <br />
      2. Protein-coding genes had both methylated DNA and di- and/or trimethylated H3K4. Methylation of DNA but not H3K4 was correlated with suppressed transcription. <br />
      3. If DNA and H3K4 were comethylated, transcription was only slightly reduced. <br />
      4. Transcriptional activity was positively correlated with the ratio of H3K4me3/H3K4me2: genes with predominantly H3K4me3 were actively transcribed, whereas genes with predominantly H3K4me2 were transcribed at moderate levels. <br />
      5. More protein-coding genes contained all three modifications, and more transposons contained DNA methylation in shoots than cultured cells. Differential epigenetic modifications correlated to tissue-specific expression between shoots and cultured cells. <br />
      Collectively, this study provides insights into the rice epigenomes and their effect on gene expression and plant development. (2008  Feb) <strong><em>The <em>Plant Cell</em></em></strong><em>.</em>; 20: 259-276</p>
      <p></p></td>
    </tr>
<tr>
  <td width="678" height="166" class="bodyText"><p align="center"><img src="http://cals.arizona.edu/research/xwang/milestones/Rice_Chipchip.jpg" alt="" width="525" height="483" /></p>     </td>
      <td width="344" height="166" class="bodyText"><div align="left">Our data support a model that in rice chromatin genes are marked by different epigenetic modifications whose combinations determine distinct gene expression states. There are four typical chromatin states with respect to the three epigenetic modifications examined in this study (Figure 6D). DNA methylation in the absence of methylated H3K4 (state 1) marks a gene for silencing, resulting in a condensed chromatin structure that impedes transcription. The presence of H3K4me2, even in the presence of DNA methylation (state 2), alters the chromatin structure to a form permissive for initiation of transcription. The presence of moderate amounts of H3K4me3 (state 3) adjusts the chromatin to a state permitting more active transcription. Finally, if H3K4me3 is the dominant modification (state 4), the chromatin adopts a conformation permitting maximal transcription.</div></td>
</tr>
<tr>
  <td height="21" colspan="2" bgcolor="#CC0000" class="bodyText">&nbsp;</td>
    </tr>
<tr>
  <td height="61" colspan="2" class="bodyText"><p><strong>2006 ~ 2007 (Yale University)</strong></p>
      <p><strong>Statistical analysis of  tiling-path microarrays</strong>. A reveiw chapter in book: <strong><em>Oligonucleotide microarray sequence analyses</em></strong></p></td>
    </tr>
<tr>
  <td height="166" class="bodyText"><img src="http://cals.arizona.edu/research/xwang/milestones/tiligarray.jpg" alt="" width="566" height="528" /></td>
      <td height="166" class="bodyText"><div align="left">Illustrations of tiling  arrays for mRNA analysis, noncoding RNA analysis and ChIP-on-chip experiments.  A) An average resolution 46 bp tiling array used to experimentally confirm the  predicted gene structures and identify novel transcriptionally active regions (TAR). B) Resolution 5 bp tiling  array, the higher resolution of tiling array, the smaller exons could be  identified.  C) Tiling arrays in ChIP-on-chip  experiment for detecting histone H3 lysine acetylations. Pink peaks are  distributions of P values calculated by Hidden Markov Model. Second track are  tiling array signal heat map, in which yellow intensive regions represents  higher ChIP-enriched regions. D) High resolution (5 bp) tiling arrays used for non-coding RNA  transcripts analysis.</div></td>
</tr>
<tr>
  <td height="21" colspan="2" bgcolor="#CC0000" class="bodyText">&nbsp;</td>
    </tr>
<tr>
  <td height="231" colspan="2" class="bodyText"><p><strong>2006 ~ 2007 (Yale University)</strong></p>
      <p><strong>Gloabl analyses of intergenic transcriptionally active regions (TAR) in rice by tiling-path microarrays</strong></p>
      <p align="left">Genome tiling-path microarray  experiments in several model organisms have discovered rich transcription  activity beyond annotated genes, or called TARs, which have been regarded as  the “Dark Matter” in the genome. In the third phase of rice genome  transcription study, we have conducted a global identification and  characterization of TARs in rice <em>japonica</em> subspecies. <br />
          Using a less stringent criterion,  we totally identified 25,352 and 27,747 TARs not encoded by annotated exons in  rice two subspecies <em>japonica</em> and <em>indica,</em> respectively. Approximately two  thirds of total TARs are conserved between <em>japonica</em> and <em>indica</em>. Subsequent analysis  indicated that about 80% of the TARs (<em>japonica</em>)  can be assigned to various putative functions and structural elements of rice  genome, including splicing variants, uncharacterized portions of incompletely  annotated genes, antisense transcripts, duplicated gene fragments, and  potential non-coding RNAs. <strong><em>PLoS  ONE</em></strong>.;  2(3): e294</p>       </td>
        </tr>
<tr>
  <td height="83" class="bodyText"><img src="http://cals.arizona.edu/research/xwang/milestones/tar.jpg" alt="" width="640" height="268" /></td>
          <td height="83" class="bodyText"><p>Right: Intergenic TARs demostrate a derailed distribution of GC3 vs GC2 indicating their lacking ability in coding proteins.</p>
            <p>Left: ~500 TARs exhbited differential expression and most of TAR constitutely expressed across 10 rice tissues.</p></td>
</tr>
<tr>
  <td height="21" colspan="2" bgcolor="#CC0000" class="bodyText">&nbsp;</td>
    </tr>
<tr>
  <td height="61" colspan="2" class="bodyText"><p><strong>2005 ~ 2006 (National Institute of Biological Sciences, Peking University, Beijing)</strong></p>
      <p><strong>NMPP, A software for processing NimbleGen microarray data</strong></p>       </td>
    </tr>
<tr>
  <td height="83" class="bodyText"><img src="http://cals.arizona.edu/research/xwang/milestones/nmpp.jpg" alt="" width="548" height="511" /></td>
      <td height="83" class="bodyText"><p>NMPP package is a bundle of user-customized tools based on established algorithms and methods to process selfdesigned NimbleGen microarray data. It features a command-linebased integrative processing procedure that comprises five major functional components, namely the raw microarray data parsing and integrating module, the array spatial effect smoothing and visualization module, the probe-level multi-array normalization module, the gene expression intensity summarization module and the gene expression status inference module.<br />
          http://plantgenomics.biology.yale.edu/nmpp</p>
          <p><strong><em>Bioinformatics</em></strong>.; (2006 Dec); 22(23): 2955-7;</p></td>
</tr>
<tr>
  <td height="21" colspan="2" bgcolor="#CC0000" class="bodyText">&nbsp;</td>
    </tr>
<tr>
  <td height="162" colspan="2" class="bodyText"><p><strong>2005 ~ 2006 (National Institute of Biological Sciences, Peking University, Beijing)</strong></p>
      <p><strong>Transcriptional map of rice indica by genome-wide tiling-path microarrays</strong></p>
      <p align="left">We conducted a comprehensive analysis of rice <em>indica</em> genome transcription activity and provided experimental  evidence for the rice genome annotation based on computational prediction. Our analysis detected  transcription activity of 35,970 (81.9%) annotated gene models and found 10,425  (23.8%) gene models showed significant antisense transcription. We also identified  5,464 unique transcribed intergenic regions (TAR). 73.1% of the TARs are highly  conserved in rice <em>japonica</em> genome,  while 44.7% of TARs were found to be homologous to plant ESTs. Analysis of the frequency  of simple sequence repeat (SSR) motifs indicated that “GA” SSR motif was richly  distributed in TARs. <strong><em>Nature  Genetics</em></strong>; 38:  124 – 129</p>       </td>
    </tr>
<tr>
  <td height="166" class="bodyText"><img src="http://cals.arizona.edu/research/xwang/milestones/ng.jpg" alt="" width="610" height="334" /></td>
      <td height="166" class="bodyText"><p align="left">Transcription analysis of 18  distinct duplicated segments in rice genome was carried out and found an  overall similarity of transcriptional activity between duplicated segments of  the genome. 14 of the 18 duplication pairs have significant positive  correlation and the 17th duplication occurred on chromosome 11 and  12, which is the nearest duplication to the modern day, has the highest  correlation of 0.731.</p></td>
</tr>
<tr>
  <td height="21" colspan="2" bgcolor="#CC0000" class="bodyText">&nbsp;</td>
    </tr>
<tr>
  <td height="61" colspan="2" class="bodyText"><p><strong>2004 ~ 2005 (National Institute of Biological Sciences, Peking University, Beijing)</strong></p>
      <p><strong>Activity of transposable elements changes following developmental stages in rice (PCR-based Tiling array)</strong></p>       </td>
    </tr>
<tr>
  <td height="83" class="bodyText">Rice chromosome 4 has a unique feature that the entire  chromosome can be divided into distinct heterochromatin half (0~17.5 Mb) and  euchromatin half (17.5 ~ 34Mb). From our tiling analysis, we discovered a close  correlation between transcriptional activity and chromosome organization and  the developmental regulation of transcription activity at the chromosome level:  in early developmental stages, the gene-rich euchromatic portion is more actively transcribed than is  transposon-rich heterochromatic portion of the chromosome; however in mature  developmental stages, transcription activity of the  transopson-related genes in heterochromatic regions was observed to be highly  increased, but oppositely, the protein-coding gene’s transcription activity in  the euchromatic regions was reduced. <strong><em>The  Plant Cell</em></strong>.;  17(6):1641-57. </td>
      <td height="83" class="bodyText"><div align="center"><img src="http://cals.arizona.edu/research/xwang/milestones/tpc003small.jpg" alt="" width="320" height="424" /></div></td>
</tr>
<tr>
  <td height="21" colspan="2" bgcolor="#CC0000" class="bodyText">&nbsp;</td>
    </tr>
<tr>
  <td height="166" colspan="2" class="bodyText"><p><strong>2003 ~ 2008 (NIBS, Peking Univeristy, Beijing Genomics Institute, CAS)</strong></p>
      <p><strong>Gene micorarray analysis related projects</strong></p>
      <p>1. A  microarray analysis of the rice transcriptome and its comparison to  Arabidopsis. <strong><em>Genome Research</em></strong>. 15(9):1274-1283</p>
      <p>2. Global  genome expression analysis of rice in response to drought and high-salinity  stresses in shoot, flag leaf, and panicle.   (2007 Mar) <strong><em>Plant Molecular Biology</em></strong>; 63(5):591-608. Epub 2007 Jan 16</p>
      <p>3. A  Genome-Wide Transcription Analysis Reveals a Close Correlation of Promoter  INDEL Polymorphism and Heterotic Gene Expression in Rice Hybrids. (2008 Aug) <strong><em>Molecular  Plant</em></strong>; 1: 720-731</p>
      <p>4. Characterization  of the genome expression trends in the heading-stage panicle of six rice  lineages. accepted by <strong><em>Genomics</em></strong></p></td>
    </tr>
      </table>
</td>
    <td width="13">&nbsp;</td>
    <td width="4" valign="top"><div align="left"><br />
      &nbsp;<br />
    </div>

Revision as of 15:52, 12 March 2011

Milestones in My Bioinformatics and Plant genomics Research

2008 ~ 2010 (Harvard University, Dana-Farber Cancer Institute)
Computational models for cataloging smRNAs (smRNA-Seq) and RNA-mediated Transcriptional Gene Silencing

Generally, the ncRNAs that have regulatory function can be primarily classified based on sizes: long ncRNAs (>40 nt) and small ncRNAs (20~25 nt, smRNAs). Long ncRNAs usually possess miRNA-like signatures and influence specific genes localized in antisense or nearby, while smRNAs are involved in a wide spectrum of pathways and function either in trans- or cis-mode. In plant, endogenous smRNAs refer to microRNAs (miRNAs) and short-interfering RNAs (siRNA) that are produced by distinct pathways, and the latter has multiple subclasses. Two of the subclasses named trans-acting siRNAs (ta-siRNAs) and natural-antisense-transcript derived siRNAs (nat-siRNAs), arising from respective TAS genes and convergent gene pairs, will be incorporated in RISC (RNA-induced silencing complex) to mediate translational repression or mRNA degradation, while the rest of siRNA classes mostly derived from tandem repeats or transposable elements (TEs) are found in RITS (RNA-induced initiation of transcriptional silencing, [yeast]) or RdDM (RNA-directed DNA methylation, [Arabidopsis]) complex to mediate epigenetic transformations by targeting on nascent RNAs amid transcription. However, the function and biogenesis pathways of a great number of siRNA classes are poorly characterized.