create a Java programming language with the following information? In a FASTA format DNA sequence file, a
Question:
create a Java programming language with the following information?
In a FASTA format DNA sequence file, a sequence record starts with a header line beginning with a ">" sign, and followed by a sequence identifier (such as GenBank accession number) and a description about the sequence. Develop a Java program to read in a sequence file. This is the sequence file:
>by21f03.y1|BF727444
CACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAATTCACCCC
TCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGAC
CACCCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGA
CAGCGGCTGCTGGATGCTCTGGAATTCCAGCCCAACTACTCGGGCCTCCA
ACTTCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTC
AGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAG
GATCAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCA
CTGAGGACTGCTCCTGGAATTCAGGACCGCT
>by05e12.y1|BF726365
CCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCTACGAGGACCG
GGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCACCCCAACCTGC
AGCCCTACTTGAGGAATTCGAACTCGGCGCGCGTGGACAGCGGCTGCTGG
ATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACTTCCTGCGCCG
CGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTCAGCGACTCGGTCC
GCTCCTGCCGCCTC
>by09f05.y1|BF726635
CACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCC
TCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGAC
CACCCCAACCTGCGGAATTCCTTGAGCCGCTGCAACTCGGCGCGCGTGGA
CAGCGGCTGCTGGATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGT
ACTTCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTC
AGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAG
GATCAGACTCTATGAGGAATTCCCCTACAGAGGCCAGATGATAGAGTTCA
CTGAGGACTGCTCCTGTCTTCAGGACCGCTTCCGCTTCAATGAAATCCAC
TCCCTCAACGTGCTGGAGGGCTCCTGGGTCCTCTACGAGCTGTCCAACTA
CCGAGGACGGCAGTACCTG
>by14f12.y1|BF726960
CAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCT
ACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCAC
CCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGACAG
CGGCTGCTGGATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACT
TCCTGCGCCGCGGCGACTATGGAATTCGGCAGCAGTGGATGGGCCTCAGC
GACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAGGAT
CAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCACTG
AGGAC
>by20g06.y1|BF727389
CAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCT
ACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCAC
CCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGACAG
CGGCTGCTGGATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACT
TCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTCAGC
GACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAGGAT
CAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCACTG
AGGACTGCTCCTGTC
>by18g06.y1|BF727241
CGCGAGCCTCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCA
GCAGCGACCACCCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCG
CGCGTGGACAGCGGCTGC
and find out how may sequences are in the file (count the number of the header line). The program should prompt the user for the sequence file name, and then print a message to state how many sequences are contained in the file, such as:
Enter the name of the sequence file: seq.fasta
File seq.fasta contains 6 sequences
In the above In-Class exercise, you need to read through the whole file to determine the number of head lines. So you can separate the actual sequence from the head line for each sequence record. Please modify the above program to search through the sequence of each record for any restriction site. Underline the restriction sites with "*"s. See a sample output below:
Enter the name of the sequence file: seq.fasta
Enter the sequence of a restriction site: GAATTC
>by21f03.y1|BF727444
CACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAATTCACCCC
******
TCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGAC
CACCCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGA
CAGCGGCTGCTGGATGCTCTGGAATTCCAGCCCAACTACTCGGGCCTCCA
******
ACTTCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTC
AGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAG
GATCAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCA
CTGAGGACTGCTCCTGGAATTCAGGACCGCT
******
>by05e12.y1|BF726365
CCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCTACGAGGACCG
GGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCACCCCAACCTGC
AGCCCTACTTGAGGAATTCGAACTCGGCGCGCGTGGACAGCGGCTGCTGG
******
ATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACTTCCTGCGCCG
CGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTCAGCGACTCGGTCC
GCTCCTGCCGCCTC
Systems analysis and design
ISBN: 978-0136089162
8th Edition
Authors: kenneth e. kendall, julie e. kendall