Question: Low quality data can have tremendous down-stream implications if it is not dealth with early. A common technique to mitigate the effects of low quality
Low quality data can have tremendous down-stream implications if it is not dealth with early. A common technique to mitigate the effects of low quality data is to 'mask' it. To mask the data means to change the base calls to some predefined letter that is known to represent data to be ignored.
The FASTQ file format has both DNA base call and quality information for each sequence in the file. The length of the sequence and the length of the quality information are the same and with some simple programming, the quality value for each base of a sequence can be found. Write a Python program that does the following;
1)Outputs to the screen the sequence reads in FASTA format with low quality bases changed to an 'N'
2)The quality cutoff to define an 'N' is entered by the user
3)The FASTQ file to mask is entered by the user
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
