Commonly used scripts

From OpenWetWare
Jump to navigationJump to search


Usage of commonly used scripts

Scripts that convert between file formats

Convert fasta to phylip format:

perl Fasta2Phylip.pl infile.fa > outfile.phy

Convert phylip to fasta:

perl Phylip2Fasta.pl infile.phy outfile.fa

Convert MSG genotypes files to plink file format:

perl convert_msg_genotypes_to_plink.pl genotypes_file_name

Writes two output file: genotypes_file_name.ped; genotypes_file_name.map

Convert a fasta file to a fastq file:

perl fasta_to_fastq.pl infile.fasta > outfile.fq

Convert the ancestry tsv output from AncestryHMM to a hard-calls genotypes files:

perl parsetsv_to_genotypes_Dec2017_v2.pl ancestry-probs-par1.tsv ancestry-probs-par2.tsv genotypes_output_file_name

Convert the MSG genotypes file to a format that is compatible with rQTL (still requires phenotypes added, see workflow documentation):

perl genotypes_to_rqtl_Feb2018_v3.pl genotypes_file_name

Writes an output file: genotypes_file_name.rqtl.csv

Convert a phylip file to an input file for treemix:

perl phy_to_treemix.pl input.phy population_keys_list > outfile

The population keys list contains the number of individuals per population, e.g.: 2\n1\n3\n

Covert a samtools vcf file to input counts for ASE pipeline:

perl samtools_vcf_to_ASE_counts.pl infile.vcf

Writes an output file: infile.vcf_ASE_counts

Convert a two-sequence clustal alignment to a fasta alignment:

perl convert_clustal_alignment_to_fasta_alignment.pl clustal_alignment

Scripts that merge files, filter files, or match file contents to lists

Combine read files for two different sequencing runs of the same individual. The two files to be combined *must* be in the same order:

perl combine_reads_two_lists.pl list_full_path_to_file1_set list_full_path_to_file2_set

Writes a new file for each individual, using the name in list 1 appended with _combined

Script for filtering genotypes file of redundant columns. The number in the command line corresponds to the number of markers that can differ between adjacent columns for the column to be retained:

perl filter_identical_columns_threshold.pl genotypes_file num_markers_differentiation path_to:transpose_nameout.pl

Writes an output file with genotypes_file.identicalfilter.txt

Script for grepping a list from a separate file with an option to use grep -w or regular grep

perl grep_list.pl select_list file_to_grep_from outfile_name grep_w_1_or_0

Similar script but automatically retains header line:

perl grep_list_keep_header.pl select_list file_to_grep_from outfile_name grep_w_1_or_0

Match a phenotypes file with a genotypes and hybrid index file:

perl match_phenotypes_names_with_genotypes_and_index_file.pl phenotypes_file genotypes_file hybrid_index

Write output files appended with modified names indicating matching

Scripts [or commands] that extract sequences from files

Scripts for managing fastq files

Scripts that summarize data or run analyses

Write average ancestry per site for every site in an MSG genotypes file:

perl parse_genotypes_ancestry_bysite.pl genotypes_file path_to:transpose_nameout.pl

Writes average ancestry per site to an output file named after the input file and appended with _ancestry_by_site

perl parsetsv_ancestry_v2.pl ancestry-probs-par1.tsv ancestry-probs-par2.tsv

Writes hybrid index and heterozygosity per individual

Miscellaneous