Question: I need to update my script to include the percentage of overall amino acids compared to the total, then only display the top 5. The

I need to update my script to include the percentage of overall amino acids compared to the total, then only display the top 5. The script it set up to process a fasta file already it's just the display of the results that I want to change. Please do not give a totally different script, others have tried this and given me bad scripts that don't even work.

My python script:

#!/usr/bin/env python3

def FASTA(filename):

try:

f = open(filename)

except IOError:

print ("The file, %s, does not exist" % filename)

return

order = []

sequences = {}

counts = {}

for line in f:

if line.startswith('>'):

name = line[1:].rstrip(' ')

#name = name.replace('_', ' ')

order.append(name)

sequences[name] = ''

else:

sequences[name] += line.rstrip(' ').rstrip('*')

for aa in sequences[name]:

if aa in counts:

counts[aa] = counts[aa] + 1

else:

counts[aa] = 1

print ("%d sequences found" % len(order))

print (counts)

return (order, sequences)

x, y = FASTA("/home/jorvis1/e_coli_k12_dh10b.faa")

I need the output to look like this (instead of currently only showing the count) where it includes the percentage for the total amino acid sequence (only need the top 5 amino acids with highest percentage):

L: 139002 (10.7%)

A: 123885 (9.6%)

G: 95475 (7.4%)

V: 91683 (7.1%)

I: 77836 (6.0%)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!