Question: Part 2 Introduction to the problem For this assignment you will receive as input two text files, rebase210.txt and sequences.txt. After the header, each line
Part 2 Introduction to the problem For this assignment you will receive as input two text files, rebase210.txt and sequences.txt. After the header, each line of the database file rebase210.txt contains the name of a restriction enzyme and possible DNA sites the enzyme may cut (cut location is indicated by a ) in the following format: enzyme_acronym/recognition_sequence/.../recognition_sequence// For instance the first few lines of rebase210.txt are: AanI/TTA'TAA// AarI/CACCTGCNNNN'NNNN/'NNNNNNNNGCAGGTG// AasI/GACNNNN'NNGTC// AatII/GACGT'C// AbsI/CC'TCGAGG// AccI/GT'MKAC// AccII/CG'CG// AccIII/T'CCGGA// Acc16I/TGC'GCA// Acc36I/ACCTGCNNNN'NNNN/'NNNNNNNNGCAGGT// ...
That means that each line contains one enzyme acronym associated with one or more recognition sequences. For example on line 2: The enzyme acronym AarI corresponds to the two recognition sequences CACCTGCNNNN'NNNN and 'NNNNNNNNGCAGGTG.
Part 2(a) (25 points)
You will create a parser to read in this database and construct an AVL tree. For each line of the database and for each recognition sequence in that line, you will create a new SequenceMap object that contains the recognition sequence as its recognition_sequence_ and the enzyme acronym as the only string of its enzyme_acronyms_, and you will insert this object into the tree. This is explained with the following pseudo code: Tree a_tree;
string db_line; // Read the file line-by-line: while (GetNextLineFromDatabaseFile(db_line)) {
// Get the first part of the line: string an_enz_acro = GetEnzymeAcronym(db_line); string a_reco_seq; while (GetNextRecognitionSequence(db_line, a_rego_seq){
SequenceMap new_sequence_map(a_reco_seq, an_enz_acro);
a_tree.insert(new_sequence_map); } // End second while.
} // End first while.
In the case that the new_sequence_map.recognition_sequence_ equals the recognition_sequence_ of a node X in the tree, then the search trees insert() function will call the X.Merge(new_sequence_map) function of the existing element. This will have the effect of updating the enzyme_acronym_ of X. Note, that this will be part of the functionality of the insert() function. The Merge() will only be called in case of duplicates as described above. Otherwise, no Merge() is required and the new_sequence_map will be inserted into the tree. To implement the above, write a test program named query_tree which will use your parser to create a search tree and then allow the user to query it using a recognition sequence. If that sequence exists in the tree then this routine should print all the corresponding enzymes that correspond to that recognition sequence. Your programs should run from the terminal as follows:
query_tree
For example you can write on the terminal: ./query_tree rebase210.txt
The user should enter THREE strings (supposed to be recognition sequences) for instance: CC'TCGAGG TTA'TAA TC'C
Your program should print in the standard output their associated enzyme acronyms. In the above example the output will be AbsI AanI PsiI
Not Found We will test it with a file containing three strings and run your code like that: ./query_trees rebase210.txt < input_part2a.txt
________________________________________________________________________________________________________________________________________________________________________
//
// Main file for Part2(a) of Homework 2.
#include "avl_tree.h"
// You will have to add #include "sequence_map.h"
#include
#include
using namespace std;
namespace {
// @db_filename: an input filename.
// @a_tree: an input tree of the type TreeType. It is assumed to be
// empty.
template
void QueryTree(const string &db_filename, TreeType &a_tree) {
// Code for running Part2(a)
// You can use public functions of TreeType. For example:
a_tree.insert(10);
a_tree.printTree();
}
} // namespace
int
main(int argc, char **argv) {
if (argc != 2) {
cout << "Usage: " << argv[0] << " " << endl;
return 0;
}
const string db_filename(argv[1]);
cout << "Input filename is " << db_filename << endl;
// Note that you will replace AvlTree with AvlTree
AvlTree a_tree;
QueryTree(db_filename, a_tree);
return 0;
}
____________________________________________________________________________________________
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
