Question: You are given the following five 14-long reads below. Map them to the sequence of the gene responsible for the ABO blood type , keeping
You are given the following five 14-long reads below. Map them to the sequence of the gene responsible for the ABO blood type , keeping in mind that each read might include a single nucleotide error. Report their respective starting positions along the gene (ANSWER should be integers between 1 and 177 for each). a. ccggcctcgggaag b. ttgcggacgctagc c. tcgggctccccccg d. ggggggaaggcgga e. tctgtccccccccg
I know the answer for a. is 36 looking for help with the remaining four parts.
INFO for the expert: I believe I was originally able to find this Scala guidance on chegg, but this appears off by one from my assignment worksheet, so, I must also be doing something wrong:
val s1 = "ccggcctcgggaag"
val s2 = "ttgcggacgctagc"
val s3 = "tcgggctccccccg"
val s4 = "ggggggaaggcgga"
val s5 = "tctgtccccccccg"
val g = "ggccgcctcccgcgcccctctgtcccctcccgtgttcggcctcgggaagtcggggcggcgggcggcgcgggccgggaggggtcgcctcgggctcaccccgccccagggccgccgggcggaaggcggaggccgagaccagacgcggagccatggccgaggtgttgcggacgctggccg"
// compares 2 strings and finds how many chars are different: def similarity(source: String, dest: String): Int = source.zip(dest).foldLeft(0){ case (sum, (x,y)) => if(x == y) sum else sum + 1 }
// returns a closest match (Int, Int) with position (0 based) and how many chars were different: def pos(read: String, gene: String): (Int, Int) = { gene.sliding(read.size, 1).zipWithIndex.map{ case (snip, pos) => pos -> similarity(snip, read) }.minBy(_._2) }
// find position: scala> pos(s1, g) res25: (Int, Int) = (35,1)
// verify: scala> g.substring(35, 35 + s1.size) res32: String = tcggcctcgggaag
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
