Question: Write your a Perl script to simply mask the low-quality positions (nucleotides). Your script will work on any FASTQ file independent of what encoding is
Write your a Perl script to simply mask the low-quality positions (nucleotides). Your script will work on any FASTQ file independent of what encoding is used for quality scores.
Write a Perl script that will mask poor quality regions of sequence stored in FASTQ files. Masked nucleotides are to be changed to 'n'. The script is to accept two command-line arguments:
- The base ASCII code that is used in the quality score encoding scheme followed by a lower-bound quality threshold.
- The script will mask any portion of sequence reads in the FASTQ file that have quality less than the threshold.
- The FASTQ file is provided to the script on standard input (i.e. the script reads from standard input), and it produces the resultant quality-filtered FASTQ file on standard output. Lines 1 and 3 of each FASTQ record are copied from standard input to standard output verbatim.
For example, suppose a BINF 200 student has a FASTQ file named seq1_raw.fastq that is encoded using 33 ('!') as a base. He or she wants to mask all portions that have a quality score less than 24, and store the output in a file named seq1_masked.fastq. That student might use the following command:
./q1.pl 33 24 < seq1_raw.fastq > seq1_masked.fastq
FOR EXAMPLE seq1_raw.fastq
@KXKW7:00006:00042 GCTCGCGGTTACTTTTCTTGGGTTGGTTTGGACTACTGGGGTCAAGGAACCTGGTCACCGT + BCDBBBB=B=BBBBB4BB<@A+8-89>A=B>CDDFCBFFF398@<@<@<@<77-5<44-45 ...
seq1_masked.fastq
@KXKW7:00006:00042 GCTCGCGGTTACTTTnCTTGGnnnnGTTTGGACTACTGGGnTnAAGGAACCnnnnCnnnnn + BCDBBBB=B=BBBBB4BB<@A+8-89>A=B>CDDFCBFFF398@<@<@<@<77-5<44-45
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
