Schumer lab: Depositing data on Oak

From OpenWetWare
Jump to navigationJump to search

Introduction

Oak is a large storage resource where we can keep large raw data files and large processed data files. You can link to files on Oak from your local directory on Sherlock.

For example, if you wanted to link to this file on Oak, you could navigate to your local directory and type:

ln -s /oak/stanford/groups/schumer/data/High_coverage_whole_genome_quail_data/Xcortezi_whole_genome_data_sc_project_HUIC_sc_and_wt/HUICXI17JM06wt_read_1_allcombined.fastq.gz ./


Examples of file types that should be deposited on Oak:

  • fastq.gz files
  • associated metadata files for the fastq.gz files
  • sam/bam files
  • Ancestry tsv files from completed ancestryHMM runs. Make sure to use and informative name and copy a matching cfg file.


Ancestry tsv files deposited on Oak can be found here:

/oak/stanford/groups/schumer/data/Processed_files/Ancestry_tsv_results_files


sam/bam files and vcf files will be in different subfolders of Processed_files depending on the reference they were mapped to. For example:

/oak/stanford/groups/schumer/data/Processed_files/bams_mapped_to_xbir-pacbio-v2018

Current organization system on OAK

Oak is currently organized by data type and raw and processed data folders.


We are in the process of documenting all the data available on OAK here: https://docs.google.com/spreadsheets/d/1dUBAmaz2bNa-LppWxFZ7Vymld-jxrDbh-HC_WFjXSzA/edit#gid=89601434


Raw data:

All_swordtail_low_coverage_Tn5_data

ATACseq_data

ChipSeq_data

collaboration_data

High_coverage_whole_genome_quail_data

MyBaits_data

Pacbio_data

RNAseq_data


Processed data:

Processed_files

lab_member_folders

Processed_files are resources to be used by anyone in the lab, lab_member_folders are personal backups

Moving data to or from our lab directory on Oak

Important!!! Make sure to create a new data folder for each deposited dataset and name your raw data folder informatively

For example:

Xbirchmanni_10Xchromium_Hudsonalpha_July2018_raw_data

This name gives the species, the technology used for library prep, the company that did the sequencing, and the date of sequencing.

  • Note: swordtail Tn5 data is stored in the following subdirectory:

/oak/stanford/groups/schumer/data/All_swordtail_low_coverage_Tn5_data


There are two ways to move data to our lab directory on Oak:


1) Using scp:

To move files to Sherlock (replacing with your user name):

scp myfile user@login.sherlock.stanford.edu:/oak/stanford/groups/schumer/data/mydirectory

To move files from Sherlock to a local directory:

scp user@login.sherlock.stanford.edu:/oak/stanford/groups/schumer/data/mydirectory/myfile ./


2) Using globus

Sign up for a globus account

Oak is linked to globus so you can navigate and download files through their interface.

Important: Documenting data you have uploaded

If you are the one downloading new data to Oak, place it in the appropriate directory with the full library name, data type, and month and year sequenced. For example:

Xpygmaeus_VCHO_AGCZ_COCA_whole_genome_sequences_Admera_health_June2020

or

Xcor-SMAR-IV-06_Xcor-SMAR-fromMM_Xcor-OCTZ-VI-22_Xvar-JUCH-II-22_Xcor-CHPL-V-17_CPXT-22-V-21_STACHW102-XII-03_PMHS-XII-03_Tn5_Admera-Nov2022


Important! Please update the google spreadsheet so that others can easily find the data

https://docs.google.com/spreadsheets/d/1dUBAmaz2bNa-LppWxFZ7Vymld-jxrDbh-HC_WFjXSzA/edit?usp=sharing


If you're depositing fastq files make sure to deposit the appropriate metadata files in the same directory!

For example, if your fastq files are named:

COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015_Tn5_S0_I1_alllanes_combined.fastq.gz

COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015_Tn5_S0_I2_alllanes_combined.fastq.gz

COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015_Tn5_S0_R1_alllanes_combined.fastq.gz

COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015_Tn5_S0_R2_alllanes_combined.fastq.gz


and are the result of two plates of Tn5 prep, put a single i5 file and two i7 files with informative names in the same directory. e.g.:

i5_library_COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015

20180626_Tn5_library_COAC_VI_2018_i7_barcodes.txt

20180718_Tn5_library_ACUA_VI_2015_CHAF_XI_2017_COAC_XI_2017_i7_barcodes.txt