User:James Estevez/Notebook/Spring 2011: Bdellovibrio Independent Study/2011/01/22

Bdellovibrio Independent Study

Main project page
Next entry

Friday night.

Proposal draft is up the food chain for review. Lab notebook set-up. Need to prep the analysis pipeline:

CloVR

Getting familiar with the interface and navigating Hadoop. VirtualBox doesn't seem to want to work, so need to switch to VMware for the project. What do we want to know?
1. AMI prep from inside the VM?
2. Which version?
Sample datasets from CloVR and MM.
Documentation's pretty thin.

Running the first 16s pipeline

Editing clovr_16s.config:

1. Template configuration file for
2. Input information.
3. Configuration options for the pipeline.

[input] GROUP_COUNT=1 FASTA_FILES=/mnt/AMP_Lung.small.fasta MAPPING_FILE=/mnt/IGS.qmap

PIPELINE_NAME=clovr_16S_pipeline FASTA_TAG=16S_FASTA MAPPING_TAG=MAPPING DB_TAG=clovr-core-set-aligned-imputed-fasta

2. Cluster info.
3. If the cluster_tag is present, the script will first
4. check for the presence of this cluster and if it's not
5. running will start a cluster with the default settings

[cluster] CLUSTER_NAME=local EXEC_NODES=1 CLOVR_CONF=clovr.conf CLUSTER_CREDENTIAL=local

key=/mnt/devel1.pem
host=localhost

2. Output info.
3. Specifies where locally the data will end up and also
4. logging information

[output] OUTPUT_DIRECTORY=/mnt/output log_file=/mnt/clovr_16S_run.log

1. the higher, the more output (3 = most verbose)

debug_level=3

[pipeline] PIPELINE_TEMPLATE=clovr_16S PIPELINE_ARGS=--FASTA_FILES=${input.FASTA_TAG} --MAPPING_FILE=${input.MAPPING_TAG} --DB_PATH=${input.DB_TAG} --GROUP_COUNT=${input.GROUP_COUNT}

prestart,prerun,postrun are all run locally. Use noop.xml for no operation
Prestart is run before cluster start
Possible actions: tag input data and do QC metrics

PRESTART_TEMPLATE_XML=/opt/clovr_pipelines/workflow/project_saved_templates/clovr_16S/clovr_16S.prestart.xml

Prerun is run after cluster start but before pipeline start
Possible actions: tag and upload data sets to the cluster

PRERUN_TEMPLATE_XML=/opt/clovr_pipelines/workflow/project_saved_templates/clovr_16S/clovr_16S.prerun.xml

Postrun is run after pipeline completion and after download data
Possible actions: local a local database, web browser.reorganize data for local ergatis

POSTRUN_TEMPLATE_XML=/opt/clovr_pipelines/workflow/project_saved_templates/clovr_16S/clovr_16S.postrun.xml </source>

Editing clovr_ec2.conf:

<source lang="text"> [cluster] ami=ami-0c7b8d65 key=vappio_00 master_type=c1.xlarge master_groups=vappio,web

Uncomment to use spot pricing

master_bid_price=0.68 exec_type=c1.xlarge exec_groups=vappio,web

Uncomment to use spot pricing

exec_bid_price=0.68

Maximum number of exec instances a cluster is allowed to have.

exec_max_instances=5

availability_zone=us-east-1b

Include the base clovr configuation

[] -include /mnt/vappio-conf/clovr_base.conf </source>

The first pass errored out. The mailing list says there needs to be a Vappio script to handle the cluster config. This set-up only started the local cluster. There'll be upstream problems modifying the pipeline, too, I expect.

User:James Estevez/Notebook/Spring 2011: Bdellovibrio Independent Study/2011/01/22

CloVR

Running the first 16s pipeline

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools