Proportal: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 58: Line 58:
==Genome Data Module==
==Genome Data Module==
===Table: data_scaffold===
===Table: data_scaffold===
A list of strains/genomes used in various project.
A list of strains/genomes used in various projects.


     * Last updated: 12-10-2010
     * Last updated: 12-10-2010

Revision as of 05:46, 26 July 2011

Proportal DB Schema

User Module

Project Module

Table: data_project

A list of projects

   * 72 projects, as of 07-21-2011
   * Last updated: 2010-12-10
   * No foreign key

Notes

   * "type": cpm, cpp, cps, ma, mt, p, pb, s
   * "tax_id": the link for "tax_id" is defined in data_url table. 

For instance,

   * type_id = 59919
   * source = tax
   * url = http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=59919

Table: data_projectpub

A list of publications from various projects

   * 32 publications as of 07-21-2011
   * No publication in this table is listed in data_publication table
   * Foreign key: data_project

Table: data_genepub

This table is empty. Consider to use data_publication table instead?

   * Foreign key: data_project

Table: data_publication

A list of publications related to Prochlorococcus, Cyanophage, and Synechococcus.

   * 2526 publications listed as of 07-21-2011
   * Not refered by any other table
   * pubmed_id can be used as a foreign key.
   * "year": last updated 2010

Table: data_url_map

This table is empty.

Table: data_url

The list of data links or data folders.

Meta Data Module

Table data_bats_ts

Information about field investigation.

   * No foreign key

Table: data_meta_data

Information about field investigation for each project

   * 66 meta data sets, as of 07-22-2011
   * Foreign key: data_project.id, to be fixed.

Error

   * One project_id 26 is missing in data_project table.
   * Six projects defined in data_project table (all in year 2008) do not have meta data defined in  this table.

Genome Data Module

Table: data_scaffold

A list of strains/genomes used in various projects.

   * Last updated: 12-10-2010
   * 213 strains, as of 07-22-2011
   * Fireigh key: data_project.id

Questions

   * "refseg_id" not defined
   * "seq" field can be removed because its content is further defined in data_dna and data_protein tables.

Table: data_position

List of start and end positions of gene/DNA for each strain defined in Table data_scaffold.

   * 67516 pair of positions, as of 07-22-2011
   * 9 types of sequences are defined: 16s, 23s, 5s, as, m, n, orf, ps, t
   * Foreign key: data_scaffold.id.

Table: data_dna

A list of DNA sequesnces in correspondence to sequence postion information defined in data_position table.

   * 67516 pieces of DNA sequences stored, as of 07-22-2011
   * Three foreign keys: data_position.id, data_scaffold.id and data_protein.id

Error

   * Foreign key  pos_id has error:
         o Two position ids in data_position table: 37163 and 46814 are missing in this table
         o Two pos_id: 36978 and 37113 do not exist in data_position table.

Table: data_protein

A list of protein sequences.

   * 65909 proteins defined, as of 07-22-2011 (1607 DNA sequences are not present in this table)
   * Two foreign keys: data_scaffold.id and data_protein.id

Notes

   * "cluster_id" should be removed from this table

Table: data_ortholog

Protein orthologs.

   * 830944 orthology pairs defined, as of 07-22-2011
   * Foreign keys: protein_id and ortholog_id

Table: data_protein_xref

Definition: ?

   * 36774 records stored, as of 07-22-2011
   * Foreign key: data_protein.id, to be fixed,

Error

   * Two records have missing protein_id: 36950 and 45482 in data_protein table

Affychip Expression Module

Table: data_affychip

Information about each affychip used.

   * 1 chip defined, as of 07-22-2011
   * No foreign key

Table: data_affyexp

A list of affychip experiments.

   * 20 affychip experiments, as of 07-22-2011
   * Foreign key: project_id, only three projects involving affychip experiments.

Table: data_affyprobeset

A list of probe sets for various affychip experiments.

   * 9966 records, as of 07-22-2011
   * Three foreign keys:
         o chip_id:
         o scaffold_id: has missing keys
         o feature_id: not defined

Notes

   * feature_id not defined
   * Use "begin" and "end" to match DNA\gene\protein?

Table: data_affyprobe

A list of probes for various affychip experiments.

   * 89749 records, as of 07-22-2011
   * Foreign key: probeset_id

Table: data_affydata

The expression results of Affychip experiments.

   * 110848 records, as of 07-22-2011
   * Foreign keys,
         o exp_id
         o probeset_id

Notes

   * No DNA\gene\protein info, use probeset_id?

Table: data_diel

The results of Affychip time course experiments.

   * 1695 records, as of 07-22-2011
   * Foreign keys,
         o probeset_id
         o protein_id
         o gene_id: not defined

Notes

   * gene_id not defined

Table: data_dieltimepoint

Time courses of Affychip experiemnts.

   * 42375 records, as of 07-22-2011
   * Foreign key: diel_id

Cog Module

Table: data_cog_fun

A list of Cog gene functions.

   * 24 funtion categoriess, as of 07-22-2011
   * No foreign key

Table: data_cog

A list of Cog genome annotations

   * 4874 records, as of 07-22-2011
   * Foreign key: data_cog_fun.funcode ?

Notes

   * data_cog_fun.funcode can't be regarded as a foreign key becuase some of funcodes in this table are missing in data_cog_fun table.

Table: data_protein_cog

The mapping between Cog genome and proteins.

   * 18498 records, as of 07-22-2011
   * Foreign keys: data_protein.id and data_cog.id

Gos Module

Table: data_gos_site

A list of Gos field experiments, such as sites of experiments etc.

   * 78 records, as of 07-22-2011
   * No foreign key

Table: data_gos_read

A list of field reads for various Gos experiments.

   * 9893120 records, as of 07-22-2011
   * Foreign key: data_gos_site.id, no error

Table: data_gos_to_protein

The mapping between Gos genomes and proteins.

   * 926072 records, as of 07-22-2011
   * Foreign keys:
         o data_protein.id, has error, to be fixed
         o data_gos_read.id, has error, to be fixed

Error

   * The foreign key: read_id=0 is not defined in data_gos_read table for id=1 and id=705172 in this table
   * The foreign key: protein_id=0 is not defined in data_protein table for id=1 and id=705172 in this table

Table: data_gos_blastn

A list of sequences from Gos experiments.

   * 8666847 records, as of 07-22-2011
   * Foreign keys:
         o data_scaffold.id, has error, to be fixed
         o data_gos_read.id, has error, to be fixed

Error

   * The foreign key: scaffold_id=0 is not defined in data_gos_read table for 211 records in this table
   * The foreign key: read_id=0 is not defined in data_gos_read table for 56438 records in this table

Cluster Module

Table: data_protein_cluster

A list of protein clusters.

   * 5597 records, as of 07-22-2011
   * No foreign key

Notes

   * Two distinct "type": phCOG and CyCog
   * "gene_name" not in use

Table: data_protein_cluster_synonym

The table is empty.

Table: data_protein_cluster_xref

   * 1100 records, as of 07-22-2011
   * Foreign key: data_protein_cluster.id, has error, to be fixed

Notes

   * Only one "type": c
   * "xref": COG reference id, which may correspond to multiple cluster ids

Error

   * The foreign key: some cluster_ids are not defined in data_protein_cluster table for about 880 records.

Table: data_protein_cluster_cog

This table is empty.

Table: data_clusterlink

A list of pairs of clusters.

   * 71 records, as of 07-22-2011
   * Foreign key: data_protein_cluster.id,has error, to be fixed

Notes

   * "evidence" is not in use

Error

   * The foreign key: cluster_id=0 is not defined in data_protein_cluster table.