User:James Estevez/Notebook/Spring 2011: Bdellovibrio Independent Study/2011/01/22

From OpenWetWare
Jump to navigationJump to search
Bdellovibrio Independent Study Main project page
Next entry

Friday night.

Proposal draft is up the food chain for review. Lab notebook set-up. Need to prep the analysis pipeline:

CloVR

  1. Getting familiar with the interface and navigating Hadoop. VirtualBox doesn't seem to want to work, so need to switch to VMware for the project. What do we want to know?
    1. AMI prep from inside the VM?
    2. Which version?
  2. Sample datasets from CloVR and MM.
  3. Documentation's pretty thin.

Running the first 16s pipeline

  • Editing clovr_16s.config:

<source lang="text">

    1. Template configuration file for
    2. Input information.
    3. Configuration options for the pipeline.

[input] GROUP_COUNT=1 FASTA_FILES=/mnt/AMP_Lung.small.fasta MAPPING_FILE=/mnt/IGS.qmap

PIPELINE_NAME=clovr_16S_pipeline FASTA_TAG=16S_FASTA MAPPING_TAG=MAPPING DB_TAG=clovr-core-set-aligned-imputed-fasta

    1. Cluster info.
    2. If the cluster_tag is present, the script will first
    3. check for the presence of this cluster and if it's not
    4. running will start a cluster with the default settings

[cluster] CLUSTER_NAME=local EXEC_NODES=1 CLOVR_CONF=clovr.conf CLUSTER_CREDENTIAL=local

  1. key=/mnt/devel1.pem
  2. host=localhost
    1. Output info.
    2. Specifies where locally the data will end up and also
    3. logging information

[output] OUTPUT_DIRECTORY=/mnt/output log_file=/mnt/clovr_16S_run.log

    1. the higher, the more output (3 = most verbose)

debug_level=3

[pipeline] PIPELINE_TEMPLATE=clovr_16S PIPELINE_ARGS=--FASTA_FILES=${input.FASTA_TAG} --MAPPING_FILE=${input.MAPPING_TAG} --DB_PATH=${input.DB_TAG} --GROUP_COUNT=${input.GROUP_COUNT}

  1. prestart,prerun,postrun are all run locally. Use noop.xml for no operation
  2. Prestart is run before cluster start
  3. Possible actions: tag input data and do QC metrics

PRESTART_TEMPLATE_XML=/opt/clovr_pipelines/workflow/project_saved_templates/clovr_16S/clovr_16S.prestart.xml

  1. Prerun is run after cluster start but before pipeline start
  2. Possible actions: tag and upload data sets to the cluster

PRERUN_TEMPLATE_XML=/opt/clovr_pipelines/workflow/project_saved_templates/clovr_16S/clovr_16S.prerun.xml

  1. Postrun is run after pipeline completion and after download data
  2. Possible actions: local a local database, web browser.reorganize data for local ergatis

POSTRUN_TEMPLATE_XML=/opt/clovr_pipelines/workflow/project_saved_templates/clovr_16S/clovr_16S.postrun.xml </source>

  • Editing clovr_ec2.conf:

<source lang="text"> [cluster] ami=ami-0c7b8d65 key=vappio_00 master_type=c1.xlarge master_groups=vappio,web

  1. Uncomment to use spot pricing

master_bid_price=0.68 exec_type=c1.xlarge exec_groups=vappio,web

  1. Uncomment to use spot pricing

exec_bid_price=0.68

  1. Maximum number of exec instances a cluster is allowed to have.

exec_max_instances=5

  1. availability_zone=us-east-1b


  1. Include the base clovr configuation

[] -include /mnt/vappio-conf/clovr_base.conf </source>

The first pass errored out. The mailing list says there needs to be a Vappio script to handle the cluster config. This set-up only started the local cluster. There'll be upstream problems modifying the pipeline, too, I expect.