Question: You can use the writeInterleaved.py script from Practice working with FASTQ files as a starting point for your function. #!/usr/bin/env python3 # writeInterleaved.py Interleave mate-pair

You can use the writeInterleaved.py script from Practice working with FASTQ files as a starting point for your function.

#!/usr/bin/env python3 # writeInterleaved.py """Interleave mate-pair sequences into a single file. Convert from FASTQ *.R1.fastq and *.R2.fastq files to one FASTA *.interleaved.fasta file. """ from Bio import SeqIO leftReads = SeqIO.parse("data/top24_Aip02.R1.fastq", "fastq") rightReads = SeqIO.parse("data/top24_Aip02.R2.fastq", "fastq") interleaved = [] # initialize an empty list for l,r in zip(leftReads, rightReads): interleaved.append(l) interleaved.append(r) SeqIO.write(interleaved, "top24_Aip02.interleaved.fasta", "fasta") 

logInterleave: log progress through the script's input prep, interleaving, and output.

Input: args object from get_args() function call

Output: None returned; should write the interleaved FASTA to file

Assumptions:

You should use BioPython to parse the FASTQ file, and you can assume that you will always have the same number of reads in the R1 and R2 FASTQ files.

Use BioPython to write the resulting list of interleaved SeqRecords to the FASTA file.

Use the log.write() command to wrap statements that will "print" (actually write) to your log file. Remember that the write()method does not automatically add a new line, so you will want to add " " to represent a return line at the beginning or end of each call. These logged lines should tell us what is happening and list any relevant variable values.

See the template below for additional ready-to-use functions to support logging.

Manual Tests

Ensure you can run the script to interleave the two top24_Aip02.R*.fastq files in your data/ folder.

Automated Tests

You can find these in your repo for this week, but you can also see them here. There is no need to change the tests this week. However, you will want to ensure that the interleaved.py script you write passes both of these tests.

test_interleaved.py

#!user/bin/env python3 """Test behavior of interleaved.py""" from interleaved import interleave from Bio import SeqIO def test_interleaved_list(): """Interleave two lists""" list1 = ["A", "B", "C"] list2 = ["1", "2", "3"] expected = ["A", "1", "B", "2", "C", "3"] assert interleave(list1, list2) == expected, "expect two lists to be interleaved" def test_interleaved_SeqRecords(): """Interleave two iterators of SeqRecords. Because SeqRecord comparisons are not supported, this test gets LONG. """ file1 = SeqIO.parse("scripts/tests/first3reads_Aip02.R1.fastq", "fastq") file2 = SeqIO.parse("scripts/tests/first3reads_Aip02.R2.fastq", "fastq") expected = [] for record in SeqIO.parse("scripts/tests/first3reads_Aip02.interleave_manual.fastq", "fastq"): expected.append(record) result = interleave(file1, file2) # lists are the same size assert len(result) == len(expected), "expect the two lists to have the same number of elements" assert result[1].id == expected[1].id, "expect the same indexed sequence to have the same ID" assert result[2].id == expected[2].id, "expect the next indexed sequence to also be the same" 

Templates

You can find this template in your repo for this week, but you can also see it here.

As you copy the template, be sure to rename it to interleaved.py and for each "TODO" tag, replace the tag with the request following "TODO." For instance, replace the """TODO: Say what the script does""" DocString with a DocString that says what the script does.

assignment-5-template.py

#!/usr/bin/env python3 """TODO: Say what the script does""" import argparse # for command-line argument parsing from datetime import datetime # for getting current timestamp from Bio import SeqIO # for reading/writing FASTQ/A files def get_args(): """Return parsed command-line arguments.""" parser = argparse.ArgumentParser( description="Interleave mate-pair FASTQ sequences into a single FASTA file.", formatter_class=argparse.ArgumentDefaultsHelpFormatter) # TODO add argument to get the first mate FASTQ file name (or path) # TODO add argument to get the second mate FASTQ file name # Get output FASTA file name parser.add_argument('-o', '--output', # variable to access this data later: args.output metavar='FASTA', # shorthand to represent the input value help='Provide the path for the output FASTA file.', # message to the user, it goes into the help menu type=str, required=True) # extra arguments to help us format our log file output parser.add_argument('--logFolder', # variable to access this data later: args.logFolder help='Provide the folder for log files.', # message to the user, it goes into the help menu type=str, default="results/logs/") parser.add_argument('--logBase', # variable to access this data later: args.logBase help='Provide the base for the log file name', type=str, default=parser.prog) # get the name of the script return(parser.parse_args()) def pathLogFile(logFolder, logBase): """Return a log file path and name using the current time and script name.""" timestamp = datetime.now().strftime("%Y-%m-%d-%H%M") # get current time in YYYY-MM-DD-HHMM return(f"{logFolder}{timestamp}_{logBase}.log") def interleave(mate1, mate2): """Return list of interleaved SeqRecords. Assumes mate1 and mate2 inputs are SeqIO.parse iterator objects. """ interleaved = [] # TODO: populate the interleaved list with interleaved SeqRecord objects return(interleaved) def logInterleave(args): """Create log of Interleave progress.""" logFile = pathLogFile(args.logFolder, args.logBase) with open(logFile, 'w') as log: log.write(f"Running interleaved.py on {datetime.now()} ") log.write(" **** Summary of arguments ****") # TODO log the two mate files and the output file log.write(" ") # add some space between argument data and the rest of the log # TODO add log lines and commands to do the following steps. # Unsure what/how to log? # I've provided a sample of my log file in the results/logs/2022-10-13-1544_interleaved.py.log file in this repo # 1. Get the FASTQ sequences with SeqIO.parse # 2. Get the interleaved list of SeqRecord objects # 3. Write the interleaved list of SeqRecord objects to our FASTA file with SeqIO.write log.write(f" Script has finished at {datetime.now()}") if __name__ == "__main__": logInterleave(get_args()) # pass arguments directly into the primary function

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!