You should create the following files as you work through Practice: Genome Assembly. You will run these
Question:
You should create the following files as you work through Practice: Genome Assembly. You will run these sequentially on Discovery using the provided batch script (below) - learn more in Practice: Running Batched Jobs on Discovery.
getNGS.sh
trim.sh
runSpades.sh
runQuast.sh
Note: If you write these functions so that they can use the $SRR_ID and $ORGANISM variables defined in the batch script below, you will be well on your way to finishing the first section of the final project. If you cannot do so, that is ok. We will explore how to send all our variables to our pipeline code in a future module.
sbatch_assembleGenome.sh
#!/bin/bash #SBATCH --partition=short # choose from debug, express, or short #SBATCH --job-name=assembleGenome #SBATCH --time=04:00:00 # the code pieces should run in far less than 4 hours #SBATCH -N 1 # nodes requested #SBATCH -n 1 # task per node requested #SBATCH --output="batch-%x-%j.output" # where to direct standard output; will be batch-jobname-jobID.output echo "Starting our analysis $(date)" ORGANISM="Rhodo" # in future, we will define this as part of a config file SRR_ID=SRR522244 # in future, we will define this as part of a config file echo "$ORGANISM SRR reads to process: $SRR_ID" echo "Loading our BINF6308 Anaconda environment." module load anaconda3/2021.11 source activate BINF-12-2021 echo "Downloading $SRR_ID reads $(date)" bash getNGS.sh echo "Trimming $SRR_ID reads $(date)" bash trim.sh echo "Assembling genome from trimmed $SRR_ID reads $(date)" bash runSpades.sh echo "Analyzing genome assembly $(date)" bash runQuast.sh echo "Assembly and analysis complete $(date)"
Coding Spec - documentation
Update the README.md to include a Methods section with subsections for each script listed above. Be sure to have what each flag/option in the underlying bioinformatics tool (fasterq-dump, trimmomatic, spades, and quast) does and how that impacts the output.
Also include a Conclusions from Analysis section where you briefly (2-3 sentences) interpret the genome assembly results. What were the key metrics? How "good" was the assembly?
Refresh your understanding of Markdown syntax hereLinks to an external site. to provide organizational structure to your README.md file.
Cost Management Measuring, Monitoring and Motivating Performance
ISBN: 978-1119185697
3rd Canadian edition
Authors: Leslie G. Eldenburg, Susan K. Wolcott, Liang Hsuan Chen, Gail Cook