Question: Your task is to code up a simple NGS aligner in Python. The expectation is that you will use straightforward Python logic, but for an

Your task is to code up a simple NGS aligner in Python. The expectation is that you will use straightforward Python logic, but for an extra challenge, you are welcome to code it using more sophisticated implementations to speed up the run time!

You are given

a sample genome sequence, genome.fsa

a fastq file of NGS reads, reads.fastq

Download both of these files from Canvas.

A) Write a program for an aligner that will output a file containing the alignment coordinates of each read in the fastq file within the genome, as prescribed below. Note that reads may align to either the given genomic sequence (the + strand) or its reverse complement (the - strand), and may have up to 2 mismatches. You'll probably want to use BioPython in your program to parse the inputs and for reverse complementing, but the actual logic for the alignments should be implemented using your own code. Each read has a unique identifier (see the NGS lecture notes on FASTQ files or research this online). The output of your program should be a text file containing 4 lines for each read, as follows

The read identifier. Note that BioPython (if you use it) will remove the "@" demarcating character at the start of the ID line.

The coordinates in the genome that the read aligns to. Start counting from 1, like in Rosalind HW Q14!

Whether it aligns to the + or - strand.

The number of mismatches in the alignment.

For example, for the third read in the FASTQ file, your output file should have:

HWI-ST1216:132:c2pb5acxx:7:2115:18814:33730 101-150 - 0 

Check this to make sure!

I have attached a sample of each of the code

>tpg|BK006935.2| [organism=Saccharomyces cerevisiae S288c] [strain=S288c] [moltype=genomic] [chromosome=I] [note=R64-1-1] CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACA CATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCTGTCTCTCAACTT ACCCTCCATTACCCTGCCTCCACTCGTTACCCTGTCCCATTCAACCATACCACTCCGAAC CACCATCCATCCCTCTACTTACTACCACTCACCCACCGTTACCCTCCAATTACCCATATC CAACCCACTGCCACTTACCCTACCATTACCCTACCATCCACCATGACCTACTCACCATAC TGTTCTTCTACCCACCATATTGAAACGCTAACAAATGATCGTAAATAACACACACGTGCT TACCCTACCACTTTATACCACCACCACATGCCATACTCACCCTCACTTGTATACTGATTT TACGTACGCACACGGATGCTACAGTATATACCATCTCAAACTTACCCTACTCTCAGATTC CACTTCACTCCATGGCCCATCTCTCACTGAATCAGTACCAAATGCACTCACATCATTATG CACGGCAC

@HWI-ST1216:132:c2pb5acxx:7:1210:6768:34415

ACACCCACACACCCACACACCACACCACACACCACACCACACCCACACAC

+

????DDDDDDDDDI@:CEBE)@@DDIIIDCDDCDIIII?DCDCDDCDDDD

@HWI-ST1216:132:c2pb5acxx:7:2315:2096:80538

ACACCCACACACCCACACACCACACCACACACCACACCACACCCACACAC

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!