Question: Help me create a pipeline uisng python wrapper script. Make sure to have comments on each session functions, and important areas in the code. We
Help me create a pipeline uisng python wrapper script. Make sure to have comments on each session functions, and important areas in the code. We are comparing HCMV transcriptomes and days post infection as documents been converted into fastq files.
The pipeline should automate steps Steps are written below by running one wrapper Python script that can take either the full data or your sample test data as input. Your wrapper script can call other scripts, but the user should be able to run steps with one command as directed in your README I will be creating the README, but it will be great if you can write how to run the code Be explicit about how the user should run the test data.
Write a Python script to automate steps and produce the log file requested named PipelineProjectlog and other output files in a directory named PipelineProjectwhere youve used your first and last name ALL results generated by you or the programs called should be written to this directory. The easiest way to guarantee this is to create the directory using an ossystem call and then move into it via an oschdir call.
Which strains are most similar to these patient samples? To compare to other strains, you will assemble these transcriptome reads. We dont expect assembly to produce the entire genome from transcripts, but enough to be useful in BLAST. Virus sequencing experiments often include host DNAs. It is difficult to isolate the RNA of just the virus as it only transcribes during infection of the host cell Before assembly, lets make sure our reads map to the HCMV genome. Using Bowtie create an index for HCMV NCBI accession NC Next, save only the reads
that map to the HCMV index for use in assembly. Write to your log file the number of reads in each transcriptome before and after the Bowtie mapping. For instance, if I was looking at the Donor dpi sample, I would write to
the log numbers here are arbitrary:
Donor dpi had read pairs before Bowtie filtering and read
pairs after.
Using the Bowtie output reads, assemble all four transcriptomes together to produce assembly via SPAdes.Write the SPAdes command you used to the log file.
Write Python code to calculate the number of contigs with a length and write the # to the log file as follows replace # with the correct integer:
There are # contigs bp in the assembly.
Write Python code to calculate the length of the assembly the total number of bp in all of the contigs bp in length and write this # to the log file as follows replace # with the correct integer:
There are # bp in the assembly.
Does your assembly align with other virus strains? Write Python code to retrieve the longest contig from your SPAdes assembly. Use the longest contig as blast input to query the nr nucleotide database limited to members of the Betaherpesvirinae subfamily. Think, which blast should you use? You will need to make a local database of just sequences from the Betaherpesvirinae subfamily. Your blast run should only keep the best alignment HSP for any
single querysubject pair of sequences. For the top hits, write the following to your log file: Subject accession,Percent identity, Alignment length, Start of alignment in query, End of alignment in query, Start of alignment in subject, End of alignment in subject, Bit score, Evalue, and Subject Title. Include the following header row in the log file, followed by the top hits, and tabdelimit each item:
sacc pident length qstart qend sstart send bitscore evalue stitle
If you need anymore informaiton, let me know. I need this within the next hours!
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
