Question: c++ For this assignment you will receive as input two text files, rebase210.txt and sequences.txt. After the header, each line of the database file rebase210.txt
c++
For this assignment you will receive as input two text files, rebase210.txt and sequences.txt. After the header, each line of the database file rebase210.txt contains the name of a restriction enzyme and possible DNA sites the enzyme may cut (cut location is indicated by a ) in the following format:
enzyme_acronym/recognition_sequence//recognition_sequence//
For instance, the first few lines of rebase210.txt are:
AanI/TTA'TAA// AarI/CACCTGCNNNN'NNNN/'NNNNNNNNGCAGGTG// AasI/GACNNNN'NNGTC// AatII/GACGT'C// AbsI/CC'TCGAGG// AccI/GT'MKAC// AccII/CG'CG// AccIII/T'CCGGA// Acc16I/TGC'GCA// Acc36I/ACCTGCNNNN'NNNN/'NNNNNNNNGCAGGT// Acc65I/G'GTACC//
PsiI/TTA'TAA//
That means that each line contains one enzyme acronym associated with one or more recognition sequences. For example on line 2:
The enzyme acronym AarI corresponds to the two recognition sequences CACCTGCNNNN'NNNN and 'NNNNNNNNGCAGGTG.
Question:
You will create a parser to read in this database and construct an AVL tree. For each line of the database and for each recognition sequence in that line, you will create a new SequenceMap object that contains the recognition sequence as its recognition_sequence_ and the enzyme acronym as the only string of its enzyme_acronyms_, and you will insert this object into the tree. This is explained with the following pseudo code:
Tree
while (GetNextLineFromDatabaseFile(db_line)) {
// Get the first part of the line:
string an_enz_acro = GetEnzymeAcronym(db_line); string a_reco_seq; while (GetNextRegocnitionSequence(db_line, a_rego_seq){ SequenceMap new_sequence_map(a_reco_seq, an_enz_acro); a_tree.insert(new_sequence_map); } // End second while.
}
// End first while.
In the case that the new_sequence_map.recognition_sequence_ equals the recognition_sequence_ of a node X in the tree, then the search trees insert() function will call the X.Merge(new_sequence_map) function of the existing element. This will have the effect of updating the enzyme_acronym_ of X. Note, that this will be part of the functionality of the insert() function. The Merge() will only be called in case of duplicates as described above. Otherwise, no Merge() is required and the new_sequence_map will be inserted into the tree.
To implement the above, write a test program named query_tree which will use your parser to create a search tree and then allow the user to query it using a recognition sequence. If that sequence exists in the tree then this routine should print all the corresponding enzymes that correspond to that recognition sequence.
Your programs should run from the terminal as follows:
query_tree
For example, you can write on the terminal:
./query_tree rebase210.txt
The user should enter THREE strings (supposed to be recognition sequences) for instance:
CC'TCGAGG
TTA'TAA
TC'C
Your program should print in the standard output their associated enzyme acronyms. In the above example the output will be
AbsI
AanI PsiI
Not Found
I will test it with a file containing three strings and run your code like that:
./query_trees rebase210.txt < input_part2a.txt
Please make sure the program receives the expected output.
Here is the sequence map.h
#include
#ifndef SEQUENCEMAP_H #define SEQUENCEMAP_H
#include
class SequenceMap { public: /* // Zero-parameter constructor. SequenceMap() = default;*/ // Copy-constructor. SequenceMap(const SequenceMap &rhs) = default; // Copy-assignment. SequenceMap& operator=(const SequenceMap &rhs) = default; // Move-constructor. SequenceMap(SequenceMap &&rhs) = default; // Move-assignment. SequenceMap& operator=(SequenceMap &&rhs) = default; // Destructor. ~SequenceMap() = default;
// Start of Part 1
// Constructor for recognition sequence and enzyme acronym SequenceMap(const string &a_rec_seq, const string &an_enz_acro) { recognition_sequence_ = a_rec_seq; enzyme_acronyms_.push_back(an_enz_acro); }
/* // Constructor for recognition sequence only SequenceMap(const string &a_rec_seq) { recognition_sequence_ = a_rec_seq; enzyme_acronyms_.push_back(""); }*/
// Overload the < operator bool operator<(const SequenceMap &rhs) const { return (recognition_sequence_ < rhs.recognition_sequence_); }
// Overload the << operator to print the recognition sequence with enzyme acronyms friend std::ostream &operator<<(std::ostream &out, const SequenceMap &a_SequenceMap) { out << a_SequenceMap.recognition_sequence_ << " "; for (int i = 0; i < a_SequenceMap.enzyme_acronyms_.size(); ++i) { out << a_SequenceMap.enzyme_acronyms_[i] << " "; } return out; }
// Merge two SequenceMap objects void Merge(const SequenceMap &other_sequence) { for (int i = 0; i < other_sequence.enzyme_acronyms_.size(); ++i) { enzyme_acronyms_.push_back(other_sequence.enzyme_acronyms_[i]); } }
/* // Print the recognition sequence string getRecognitionSequence() const { return recognition_sequence_; }
// Print enzyme acronym void printAllEnzAcroOfRecSeq() const { for (int i = 0; i < enzyme_acronyms_.size() ; ++i) { cout << enzyme_acronyms_[i] << " "; } cout << endl; }*/
private: string recognition_sequence_ ; vector
#endif //end of SequenceMap
//test program code started
// Main file for Part2(a) of Homework 2.
#include "avl_tree.h" //just need to assume this. info is below
#include "sequence_map.h"
#include
namespace {
// @db_filename: an input filename. // @a_tree: an input tree of the type TreeType. It is assumed to be // empty. template
//already provided in avl_tree.h a_tree.insert(10); a_tree.printTree(); }
} // namespace
int main(int argc, char **argv) { if (argc != 2) { cout << "Usage: " << argv[0] << "
Please fill out the query_tree.cc program to parse the file and insert it into the tree.
Thank you.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
