Commonly used scripts
Usage of commonly used scripts
Scripts that convert between file formats
Convert fasta to phylip format:
perl Fasta2Phylip.pl infile.fa > outfile.phy
Convert phylip to fasta:
perl Phylip2Fasta.pl infile.phy outfile.fa
Convert MSG genotypes files to plink file format:
perl convert_msg_genotypes_to_plink.pl genotypes_file_name
Writes two output file: genotypes_file_name.ped; genotypes_file_name.map
Convert a fasta file to a fastq file:
perl fasta_to_fastq.pl infile.fasta > outfile.fq
Convert the ancestry tsv output from AncestryHMM to a hard-calls genotypes files:
perl parsetsv_to_genotypes_Dec2017_v2.pl ancestry-probs-par1.tsv ancestry-probs-par2.tsv genotypes_output_file_name
Convert the MSG genotypes file to a format that is compatible with rQTL (still requires phenotypes added, see workflow documentation):
perl genotypes_to_rqtl_Feb2018_v3.pl genotypes_file_name
Writes an output file: genotypes_file_name.rqtl.csv
Convert a phylip file to an input file for treemix:
perl phy_to_treemix.pl input.phy population_keys_list > outfile
The population keys list contains the number of individuals per population, e.g.: 2\n1\n3\n
Covert a samtools vcf file to input counts for ASE pipeline:
perl samtools_vcf_to_ASE_counts.pl infile.vcf
Writes an output file: infile.vcf_ASE_counts
Convert a two-sequence clustal alignment to a fasta alignment:
perl convert_clustal_alignment_to_fasta_alignment.pl clustal_alignment
Scripts that merge files, filter files, or match file contents to lists
Combine read files for two different sequencing runs of the same individual. The two files to be combined *must* be in the same order:
perl combine_reads_two_lists.pl list_full_path_to_file1_set list_full_path_to_file2_set
Writes a new file for each individual, using the name in list 1 appended with _combined
Script for filtering genotypes file of redundant columns. The number in the command line corresponds to the number of markers that can differ between adjacent columns for the column to be retained:
perl filter_identical_columns_threshold.pl genotypes_file num_markers_differentiation path_to:transpose_nameout.pl
Writes an output file with genotypes_file.identicalfilter.txt
Script for grepping a list from a separate file with an option to use grep -w or regular grep
perl grep_list.pl select_list file_to_grep_from outfile_name grep_w_1_or_0
Similar script but automatically retains header line:
perl grep_list_keep_header.pl select_list file_to_grep_from outfile_name grep_w_1_or_0
Match a phenotypes file with a genotypes and hybrid index file:
perl match_phenotypes_names_with_genotypes_and_index_file.pl phenotypes_file genotypes_file hybrid_index
Write output files appended with modified names indicating matching
Scripts [or commands] that extract sequences from files
Scripts for managing fastq files
Scripts that summarize data or run analyses
Write average ancestry per site for every site in an MSG genotypes file:
perl parse_genotypes_ancestry_bysite.pl genotypes_file path_to:transpose_nameout.pl
Writes average ancestry per site to an output file named after the input file and appended with _ancestry_by_site
perl parsetsv_ancestry_v2.pl ancestry-probs-par1.tsv ancestry-probs-par2.tsv
Writes hybrid index and heterozygosity per individual