Endosperm is biologically and economically important. Endosperm provides nutrients and signals to the embryo during seed development. Endosperm is an important source of food and industrial raw materials. Additionally, cereal endosperm is used as a raw material for numerous industrial products including ethanol. We are currently collaborating with Prof. Brian Larkins and Prof. Ramin Yadegari on a NSF funded project to study the regulation of maize early endosperm development. In this project we are using Illumina high-throughput sequencing to profile the mRNA transcriptome (RNA-Seq) to identify the core transcription factors, and build the regulatory network that controls the maize endosperm development in early stages.
See the UA news for the description of this project at 
A genome browser for maize endosperm transcriptome.
We have constructed a local UCSC genome browser at UA to display the maize endosperm RNA-Seq transcriptome.
Identification of important TFs in maize endosperm development
We first conducted a in silico search of maize annotated genes in its 5.0a version in plant transcription factor database, and identify hundreds potential TF maize genes. Then we utilized publish gene microarray expression data and our self-produced RNA-seq data to further filter the gene sets. Our efforts identified ~200 candidate genes that specifically expressed in early stages of endosperm development (A). We are currently performing experimental validation of selected candidate TFs. In the next step, to construct the regulatory network of those identified TFs, we identified the genes whose expression highly correlated with those TFs and calculated their topological structures (B). The regulatory network is finally illustrated in (C). Additionally, we are developing hidden markov model (HMM) to detect the alternative splicing genes during the development (D).
PROJECT 2. Chromatin and epigenomic landscape of the developing maize endosperm
Another research interest in our Lab is to understand the epigenomic regulatory mechanisms that function during early maize endosperm development. In collaboration with Dr. Ramin Yadegari's Lab, We are currently producing selected histone marks and DNA methylation using Illumina sequencing in a famous maize hybrid B73 cross Mo17. Another potential direction we are interested in is to study if the genomic imprinting influence the heterosis phenomena.
What is heterosis
Heterosis, or hybrid vigor, is the increased function of any biological quality in a hybrid offspring. It is the occurrence of a genetically superior offspring from mixing the genes of its parents. Nearly all field corn (maize) grown in most developed nations exhibits heterosis. Modern corn hybrids substantially outyield conventional cultivars and respond better to fertilizer.
Above figure was modified from two papers by (Springer and Stupar 2007 and He et al, 2010)
Improve maize gene prediction using active histone marks
We are sequencing two activating histone marks H3K9ac and H3K36me3, which are usually associated with transcription initiation and elongation, respectively. The former peaks at TSSs of actively transcribed genes, while the latter spreads along the entire gene body regions (A). Therefore, the two modifications have been considered hallmarks for active promoters or actively transcribed genes (B). The current version maize genome annotation contains certain computationally predicted genes with wrongly defined gene borders (multiple genes were predicted as one gene). Because H3K9ac marks the gene start and H3K36me3 marks the gene body region, we can use these information to correct the wrongly annotated genes. In our Lab, we are developing supervised Hidden Markov Model (HMM) to integrate the epigenetic information with RNA-Seq signal information to finally improve the gene models in maize.
PROJECT 3. Assembly and annotation of Thellungiella halophila genome
We are also collaborating with Dr. Karen Schumaker Lab on genome sequencing project of Thellungiella halophila, which is a halophytic relative of both the genetic model Arabidopsis thaliana (Arabidopsis). The genome sequence of T. halophila will be a critical resource for the fields of stress biology, evolutionary biology, and comparative genomics. We are currently working on using Arabidopsis as reference genome to assemble the hellungiella halophila. We are also analyzing the mRNA transcriptome sequenced by 454 under several stress libraries such cold, salinity and drought to identify the genes involved in stress resistance.