Question: create a Java programming language with the following information? In a FASTA format DNA sequence file, a sequence record starts with a header line beginning

create a Java programming language with the following information? 

In a FASTA format DNA sequence file, a sequence record starts with a header line beginning with a ">" sign, and followed by a sequence identifier (such as GenBank accession number) and a description about the sequence. Develop a Java program to read in a sequence file. This is the sequence file:

>by21f03.y1|BF727444
CACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAATTCACCCC
TCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGAC
CACCCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGA
CAGCGGCTGCTGGATGCTCTGGAATTCCAGCCCAACTACTCGGGCCTCCA
ACTTCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTC
AGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAG
GATCAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCA
CTGAGGACTGCTCCTGGAATTCAGGACCGCT
>by05e12.y1|BF726365
CCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCTACGAGGACCG
GGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCACCCCAACCTGC
AGCCCTACTTGAGGAATTCGAACTCGGCGCGCGTGGACAGCGGCTGCTGG
ATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACTTCCTGCGCCG
CGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTCAGCGACTCGGTCC
GCTCCTGCCGCCTC
>by09f05.y1|BF726635
CACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCC
TCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGAC
CACCCCAACCTGCGGAATTCCTTGAGCCGCTGCAACTCGGCGCGCGTGGA
CAGCGGCTGCTGGATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGT
ACTTCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTC
AGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAG
GATCAGACTCTATGAGGAATTCCCCTACAGAGGCCAGATGATAGAGTTCA
CTGAGGACTGCTCCTGTCTTCAGGACCGCTTCCGCTTCAATGAAATCCAC
TCCCTCAACGTGCTGGAGGGCTCCTGGGTCCTCTACGAGCTGTCCAACTA
CCGAGGACGGCAGTACCTG
>by14f12.y1|BF726960
CAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCT
ACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCAC
CCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGACAG
CGGCTGCTGGATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACT
TCCTGCGCCGCGGCGACTATGGAATTCGGCAGCAGTGGATGGGCCTCAGC
GACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAGGAT
CAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCACTG
AGGAC
>by20g06.y1|BF727389
CAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCT
ACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCAC
CCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGACAG
CGGCTGCTGGATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACT
TCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTCAGC
GACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAGGAT
CAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCACTG
AGGACTGCTCCTGTC
>by18g06.y1|BF727241
CGCGAGCCTCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCA
GCAGCGACCACCCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCG
CGCGTGGACAGCGGCTGC

 and find out how may sequences are in the file (count the number of the header line). The program should prompt the user for the sequence file name, and then print a message to state how many sequences are contained in the file, such as: 

Enter the name of the sequence file: seq.fasta
File seq.fasta contains 6 sequences

 

In the above In-Class exercise, you need to read through the whole file to determine the number of head lines. So you can separate the actual sequence from the head line for each sequence record. Please modify the above program to search through the sequence of each record for any restriction site. Underline the restriction sites with "*"s. See a sample output below:

Enter the name of the sequence file: seq.fasta
Enter the sequence of a restriction site: GAATTC

>by21f03.y1|BF727444
CACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAATTCACCCC
                                       ******  
TCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGAC
                                                 
CACCCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGA
                                                 
CAGCGGCTGCTGGATGCTCTGGAATTCCAGCCCAACTACTCGGGCCTCCA
                     ******                      
ACTTCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTC
                                                 
AGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAG
                                                 
GATCAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCA
                                                 
CTGAGGACTGCTCCTGGAATTCAGGACCGCT
                ******        
 

>by05e12.y1|BF726365
CCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCTACGAGGACCG
                                                 
GGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCACCCCAACCTGC
                                                 
AGCCCTACTTGAGGAATTCGAACTCGGCGCGCGTGGACAGCGGCTGCTGG
             ******                              
ATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACTTCCTGCGCCG
                                                 
CGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTCAGCGACTCGGTCC
                                                 
GCTCCTGCCGCCTC

 

Step by Step Solution

3.43 Rating (159 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

Sure here is a Java program to read in a FASTA format DNA sequence file find out how many sequences are in the file and search through the sequence of ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!