Data Files: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
 
(18 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Drosophila 16S Paper]]
== Sequence files ==
== Sequence files ==
*These are the quality-trimmed reads that did not have enough overlap to assemble (thus "frags").
*These are the quality-trimmed reads that did not have enough overlap to assemble (thus "frags").
Line 10: Line 12:
[[Image:Fly.contigs]]
[[Image:Fly.contigs]]


* NAST-aligned sequences from the Corby-Harris paper.
[[Image:Corby.NAST.aligned.fasta]]
* NAST-aligned sequences from the Cox and Gilmore paper.
[[Image:Cox.NAST.aligned]]


* A fasta file of all the Clean Chimera-checked sequenced that were unclassified at the Genus level --[[User:James Angus Chandler|James Angus Chandler]] 19:16, 24 April 2009 (EDT)
* A fasta file of all the Clean Chimera-checked sequenced that were unclassified at the Genus level --[[User:James Angus Chandler|James Angus Chandler]] 19:16, 24 April 2009 (EDT)
Line 33: Line 40:
[[Image:Fly.merged.fasta]]
[[Image:Fly.merged.fasta]]


==Environment File==
*<b>Infernal Aligned reads and sequences</b>
**Left half [[Media:Infernal_aligned_reads.left.fa.zip]]
**Right half [[Media:Infernal_aligned_reads.right.fa.zip]]
**Merged [[Media:Infernal_aligned_reads.merged.fa.zip]]
**Merged and Masked across the whole length [[Media:Infernal_aligned_reads.merged.masked.0.8.bz2‎]]
**Merged and Cleaned [[Media:Infernal_aligned_reads.merged.cleaned.fa.zip‎ ‎]]: Removing selected columns(1-11, 642-806, 1400-1524) and sequences shorter than 300.
 
== Redoing ==
*File were reprocessed to change sequences ids into a uniform format
**Format LIB_SNO[L/R]
**Sequence ID key [[Media:Fly.seqid.key.xls.zip‎]]
 
**Raw files with new seq ids
***Left half [[Media:FLY.new.left.fa.zip]]
***Right half [[Media:FLY.new.right.fa.zip]]
***All [[Media:FLY.new.seqs.zip]]
**Renamed sequence ids in alignments
***Left half [[Media:Infernal_aligned_reads.new.left.fa.zip]]
***Right half [[Media:Infernal_aligned_reads.new.right.fa.zip]]
***Merged [[Media:Infernal_aligned_reads.new.merged.fa.zip]]
**Removing libereries
***Libraries removed: NEG, N/XXN, DmW_TurPh/TPH, 3A/03A, 3B/03B
***Left half [[Media:Infernal_aligned_reads.new.sellib.left.fa.zip]]
***Right half [[Media:Infernal_aligned_reads.new.sellib.right.fa.zip‎]]
***Merged [[Media:Infernal_aligned_reads.new.sellib.merged.fa.zip‎]]
**Chimera removed
***Left half [[Media:]]
***Right half [[Media:]]
***Merged [[Media:]]
**Alignment cleaned
***Removing the missing middle chunk and the terminal columns to reduce non-overlapping sequencing bias
***Left half [[Media:]]
***Right half [[Media:]]
***Merged [[Media:]]
**Taxonomy file
***Sequence renamed
***Selected libraries removed (see above)
***Chimera removed
***Discrepancies between the left and right flagged
 
==Metadata Files==


Here is the environment file you asked for.  Look it over to tell me if I need to add anything.  I left ??? for the Cox-Gilmore samples since I cannot seem to find my copy of it and thus do not know what they collected over.
Here is the environment file you asked for.  Look it over to tell me if I need to add anything.  I left ??? for the Cox-Gilmore samples since I cannot seem to find my copy of it and thus do not know what they collected over.
Line 39: Line 86:
[[Media:MainEnvFile.xls]]  --[[User:James Angus Chandler|James Angus Chandler]] 20:45, 19 May 2009 (EDT)
[[Media:MainEnvFile.xls]]  --[[User:James Angus Chandler|James Angus Chandler]] 20:45, 19 May 2009 (EDT)


== Phylogenetic Tree ==
* This file allows the translation from JGI clone IDs to our sample IDs.
[[Image:May09.trans.xls]]
 
== Nearly Complete Taxonomy and Alignment ==
 
[[Media:NoWolb_noNNs_cleaned_noTurrs_noDescrepsAKAfinal.fasta‎ ]]
[[Media:Infernal_RDP_Correct%_noWolb_noNNs_noTurrs_noDescreps_noMissingsAKAfinal.xlsx]]
 
== CalTech Presentation ==
== CalTech Presentation ==
[[Media:CalTech_Presentation.ppt]] --[[User:James Angus Chandler|James Angus Chandler]] 00:56, 11 June 2009 (EDT)
[[Media:CalTech_Presentation.ppt]] --[[User:James Angus Chandler|James Angus Chandler]] 00:56, 11 June 2009 (EDT)

Latest revision as of 16:15, 12 July 2010

Drosophila 16S Paper

Sequence files

  • These are the quality-trimmed reads that did not have enough overlap to assemble (thus "frags").

File:Fly.frags

  • These are the quality-trimmed reads that did not have enough overlap to assemble and had a significant blast hit to the "left" side of a reference 16S sequence.

File:Fly.frags.left

  • These are the quality-trimmed reads that did not have enough overlap to assemble and had a significant blast hit to the "right" side of a reference 16S sequence.

File:Fly.frags.right

  • These are the reads that assembled into complete clones using the JGI's 16S pipeline (genelib).

File:Fly.contigs

  • NAST-aligned sequences from the Corby-Harris paper.

File:Corby.NAST.aligned.fasta

  • NAST-aligned sequences from the Cox and Gilmore paper.

File:Cox.NAST.aligned

  • A fasta file of all the Clean Chimera-checked sequenced that were unclassified at the Genus level --James Angus Chandler 19:16, 24 April 2009 (EDT)

Media:Unclassified.fasta.gz

Taxonomy Assignments

  • OLD, not quality-trimmed data: there are three files here, each corresponds to one of the three chimera-checked sequence files above (i.e., putative, sub-threshold, and clean)
  1. clean sequences File:Classifications All NAST.Bclean.fasta30593.xls
  2. sub-threshold chimeric sequences File:Classifications All NAST.Bambig.fasta27959.xls
  3. putative chimeric sequences File:Classifications All NAST.Bchimera.fasta28223.xls

Alignment

  • below is the most current version of the NAST-formatted alignment file.

this is the older, not quality-trimmed file. It contains all of our sequences, plus the Corby-Harris and Cox-Gilmore sequences.

File:All.good.gz


  • This is a concatenated alignment with the quality-trimmed data. Each half was aligned using the NAST aligner and then both halves were concatenated. There is a reference sequence in there (called testseq) that was used to decide where to end the left half and begin the right half before concatenating. This alignment does not include all of the full-length sequences that were assembled with genelib.

File:Fly.merged.fasta

Redoing

Metadata Files

Here is the environment file you asked for. Look it over to tell me if I need to add anything. I left ??? for the Cox-Gilmore samples since I cannot seem to find my copy of it and thus do not know what they collected over.

Media:MainEnvFile.xls --James Angus Chandler 20:45, 19 May 2009 (EDT)

  • This file allows the translation from JGI clone IDs to our sample IDs.

File:May09.trans.xls

Nearly Complete Taxonomy and Alignment

Media:NoWolb_noNNs_cleaned_noTurrs_noDescrepsAKAfinal.fasta‎ Media:Infernal_RDP_Correct%_noWolb_noNNs_noTurrs_noDescreps_noMissingsAKAfinal.xlsx

CalTech Presentation

Media:CalTech_Presentation.ppt --James Angus Chandler 00:56, 11 June 2009 (EDT)