Question: Applying A Data Structure To A Big Data Application [ DvcSchedule4.cpp and DynamicArray.h] In this lab you will have your first experience with manipulating big
Applying A Data Structure To A Big Data Application [ DvcSchedule4.cpp and DynamicArray.h] In this lab you will have your first experience with manipulating big data. The data is extracted from the DVC database of all class sections offered at DVC since the Fall 2001 semester. Your program is to list all of the subject codes (like COMSC, MATH, PHYS, etc), and include for each subject code the count of classes (e.g., MATH, 4514 classes).
Requirements. Write DvcSchedule4.cpp to read and parse the 70,000 line dvc-schedule.txt (Links to an external site: https://drive.google.com/file/d/0B14YTZL55whCSGE0VjcyajZxSmc/view ). text file, and find each subject code in the file. Output each code to the console screen, in alphabetical order, with the number of classes offered under that code. Use your own DynamicArray template from Lab Assignment 3. Do NOT use any STL containers, and do NOT modify your H except to make corrections. Submit the H file, even if there are no corrections since lab 3. Canvas will add it's version tracking to the file's name -- that's okay.
NOTE: The dvc-schedule.txt file is expected to be in the working folder. In command line compiling, that's the same folder as the CPP and H files. In IDEs, you'll have to figure out for your IDE and project where is the working folder. Do not submit work that has a path designation in the file-open statement.
Note -- the dvc-schedule.txt file may contain duplicate entries. The combination of a term and section number is supposed to be unique. A duplicate entry is when the same term and section number pair appear in more than one record. Do NOT count duplicates -- skip them. That means to count a duplicate entry only once, ignoring all others. You'll need some way to track what's been counted so that you don't count the same section for the same semester more than once. When you are done processing the input file, output HOW MANY DUPLICATES you found and skipped in the input file. Check that number with your classmates, because you should all come up with the same number. You may use the Q&A section of this module for that.
You can expect the runtime to be several minutes. So that you don't stare at a blinking cursor while you wait for results, add a "progress bar". To do so, count the number of lines read from the file. For every 1000 lines read, output a dot -- like this:
cout << '.'; cout.flush( );
No endl. You need cout.flush( ); to force output out of the output buffer and onto the console. After the EOF loop ends, output an endl, so that your output starts on a line after the line of dots. Or use some other method of indicating progress, as you prefer, but whatever you do, do not forget to flush! Don't get this sent back for redo simply for forgetting this!
Follow the algorithm developed in the lecture to solve this.
Be careful! Don't just accept whatever counts that your program gives you. Make sure that your program gives the right answers for the input file used. Try using a much shortened version of the TXT file, for which you know exactly what to expect. Also try loading the TXT file into Excel -- sort the data in column A, and count for yourself to verify the results of your app.
Example. (based on a previous year's version of the TXT file)
ADJUS, 557 sections ADS, 206 sections AET, 62 sections ANTHR, 596 sections ARABC, 13 sections ...
SPTUT, 12 sections TAGLG, 8 sections
Hint
Parse. Start with the simple parsing code and see if you can read and parse the TXT file.
Count. Then add your code to count how many subject codes and how many sections of each.
Duplicate Check. Then add your code to track and exclude duplicates from counting.
If you do this right there should be a loop with a code block (or series of code blocks) that read and parse a line of input, then a code block (or series of code blocks) that check and reject duplicates, then a code block (or series of code blocks) that count. These three separate groups of code blocks should NOT overlap. They should each do their own job.
Here is the code for DynamicArray.h:
#ifndef DYNAMICARRAY_H #define DYNAMICARRAY_H #include using namespace std; template class DynamicArray { private: int size; dataT dummy; dataT *data; bool *in_use; static const int SIZE = 100; unsigned int capacity; public: DynamicArray(int = SIZE); DynamicArray(const DynamicArray &); virtual ~DynamicArray(); DynamicArray & operator=(const DynamicArray &); dataT operator[](unsigned int )const; dataT& operator[](unsigned int); unsigned int Size()const; unsigned int Capacity()const; bool ContainsKey(unsigned int)const; void DeleteKey(unsigned int); vector Keys()const; void Clear(); private: void Copy(const DynamicArray &); void Delete(); void SetCapacity(unsigned int SIZE = 10); }; template DynamicArray::DynamicArray(int SIZE) { capacity = SIZE; data = new dataT[capacity]; in_use = new bool[capacity]; size = 0; for (int i = 0; i < capacity; i++) in_use[i] = false; } template DynamicArray::DynamicArray(const DynamicArray & d_arr) :data(NULL), in_use(NULL) { Copy(d_arr); } template DynamicArray:: ~DynamicArray() { Delete(); } template DynamicArray & DynamicArray::operator=(const DynamicArray & d_arr) { if (this != &d_arr) { Delete(); Copy(d_arr); } return *this; } template dataT DynamicArray::operator[](unsigned int index)const { if (index >= capacity) return dummy; return data[index]; } template dataT& DynamicArray::operator[](unsigned int idx) { if (idx + 1 > capacity) { SetCapacity(idx + 1); } if (!in_use[idx]) size++; in_use[idx] = true; return data[idx]; } template unsigned int DynamicArray::Size()const { return size; } template unsigned int DynamicArray::Capacity()const { return capacity; } template bool DynamicArray::ContainsKey(unsigned int idx)const { if (idx < capacity) return in_use[idx]; return false; } template void DynamicArray::DeleteKey(unsigned int idx) { if (idx < capacity) { if (in_use[idx]) { size--; in_use[idx] = false; } } } template vector DynamicArray::Keys()const { vector a; for (unsigned int i = 0; i < capacity; ++i) { if (in_use[i]) a.push_back(i); } return a; } template void DynamicArray::Clear() { for (unsigned int i = 0; i < capacity; ++i) in_use[i] = false; } template void DynamicArray::Copy(const DynamicArray & D_array) { size = D_array.size; capacity = D_array.capacity; data = new dataT[capacity]; in_use = new bool[capacity]; for (unsigned int i = 0; i < capacity; ++i) { in_use[i] = false; if (D_array.in_use[i]) { in_use[i] = true; data[i] = D_array.data[i]; } } } template void DynamicArray::Delete() { if (data) { delete[] data; data = NULL; } if (in_use) { delete in_use; in_use = NULL; } size = capacity = 0; } template void DynamicArray::SetCapacity(unsigned int new_size) { if (new_size == 0) { Delete(); return; } dataT * newvalues = new dataT[new_size]; bool * newInUse = new bool[new_size]; size = 0; unsigned int upperLimit; if (capacity < new_size) upperLimit = capacity; else upperLimit = new_size; for (unsigned int i = 0; i < upperLimit; ++i) { newInUse[i] = in_use[i]; if (in_use[i]) { size++; newvalues[i] = data[i]; } } if (upperLimit < new_size) { for (unsigned int i = upperLimit; i < new_size; ++i) { newInUse[i] = false; } } capacity = new_size; if (data) delete[] data; if (in_use) delete[] in_use; data = newvalues; in_use = newInUse; } template ostream & operator<<(ostream & output, const DynamicArray & D_array) { vector a = D_array.Keys(); for (unsigned int i = 0; i < a.size(); ++i) { cout << "[" << a[i] << ": " << D_array[a[i]] << "]"; if (i < a.size() - 1) cout << ", "; } return output; } #endif
And the code for DynamicArrayDriver.cpp:
#include #include #include #include "DynamicArray.h" using namespace std; void MyDynamicDriver() { DynamicArray string_array; DynamicArray int_array; cout << endl << endl; cout << "-------------------------------------------------- ostream << Test:" << endl; cout << "Before insert value into string_array: " << endl; cout << "string_array: " << string_array; cout << "int_array: " << int_array; cout << endl; cout << "------------------------------------------------- [] Test: " << endl; cout << "Inserting some junk values " << endl; string_array[5] = "fGhslawpoi"; string_array[46] = "gsljdofua"; string_array[3] = "rsdkjbh"; string_array[51] = "iuewn"; string_array[7] = "wiyur"; string_array[10] = "nviy"; string_array[18] = "rsdkjbh"; string_array[75] = "fslkhaf"; string_array[63] = "wsuur"; string_array[15] = "sdlkhw"; int_array[32] = 723; int_array[3] = 854; int_array[43] = 66; int_array[3] = 257; int_array[7] = 101; int_array[9] = 127; int_array[19] = 456; int_array[98] = 36; int_array[63] = 13; int_array[8] = 491; cout << "After: " << endl; cout << "string_array: " << string_array << endl; cout << "int_array: " << int_array << endl; cout << endl; cout << "------------------------------------------------- Copy Test: " << endl; const DynamicArray s_arr(string_array); const DynamicArrayint_arr(int_array); cout << "s_arr: " << s_arr << endl; cout << "int_arr: " << int_arr << endl; cout << endl; cout << "------------------------------------------------- = Test: " << endl; DynamicArray s_arr2; DynamicArray int_arr2; cout << "New array s_arr2 instantiated: " << s_arr2 << endl; cout << "Copy s_arr2 = string_array" << endl; s_arr2 = s_arr; int_arr2 = int_arr; cout << "New array int_arr2 instantiated: " << int_arr2 << endl; cout << "Copy int_arr2 = int_arr" << endl; cout << "s_arr2: " << s_arr2 << endl; cout << "int_arr2: " << int_arr2 << endl; cout << endl; cout << "-------------------------------------------------- [] const Test" << endl; cout << "Print first 10 values : " << endl; cout << "s_arr: "; bool needComma; needComma = false; for (unsigned int i = 0; i < 10; ++i) { if (s_arr.ContainsKey(i)) { if (needComma) cout << ", "; needComma = true; cout << "[" << i << " : "; cout << s_arr[i] << "]"; } } cout << endl; cout << "int_arr: "; needComma = false; for (unsigned int i = 0; i < 10; ++i) { if (int_arr.ContainsKey(i)) { if (needComma) cout << ", "; needComma = true; cout << "[" << i << " : "; cout << int_arr[i] << "]"; } } cout << endl; cout << "-------------------------------------------------- Size() Test: " << endl; cout << "string_array.Size(): " << string_array.Size() << endl; cout << "s_arr.Size(): " << s_arr.Size() << endl; cout << "s_arr2.Size(): " << s_arr2.Size() << endl; cout << "int_arr.Size(): " << int_arr.Size() << endl; cout << "int_arr2.Size(): " << int_arr2.Size() << endl; cout << endl; cout << "-------------------------------------------------- Capacity() Test: " << endl; cout << "string_array.Capacity(): " << string_array.Capacity() << endl; cout << "s_arr.Capacity(): " << s_arr.Capacity() << endl; cout << "s_arr2.Capacity(): " << s_arr2.Capacity() << endl; cout << "int_arr.Capacity(): " << int_arr.Capacity() << endl; cout << "int_arr2.Capacity(): " << int_arr2.Capacity() << endl; cout << endl; cout << "-------------------------------------------------- ContainsKey test: " << endl; cout << "Display first 10 data of s_arr2 : "; needComma = false; for (unsigned int i = 0; i < 10; ++i) { if (s_arr2.ContainsKey(i)) { if (needComma) cout << ", "; needComma = true; cout << "[ " << i << ": "; cout << s_arr2[i] << "]"; } } cout << endl; needComma = false; cout << "Display first 10 int_arr2: "; for (unsigned int i = 0; i < 10; ++i) { if (int_arr2.ContainsKey(i)) { if (needComma) cout << ", "; needComma = true; cout << "[ " << i << ": "; cout << int_arr2[i] << "]"; } } cout << endl << endl; cout << "-------------------------------------------------- DeleteKey Test: " << endl; cout << "s_arr2: "; for (unsigned int i = 0; i < s_arr2.Size(); ++i) { if (s_arr2.ContainsKey(i)) { s_arr2.DeleteKey(i); cout << "index[" << i << "]" << " deleted " << endl; break; } } cout << "int_arr2: "; for (unsigned int i = 0; i < int_arr2.Size(); ++i) { if (int_arr2.ContainsKey(i)) { int_arr2.DeleteKey(i); cout << "index[" << i << "]" << " deleted " << endl; break; } } cout << "then display first 10 values: " << endl; cout << "s_arr2: " ; needComma = false; for (unsigned int i = 0; i < 10; ++i) { if (s_arr2.ContainsKey(i)) { if (needComma) cout << ", "; needComma = true; cout << "[" << i << ": "; cout << s_arr2[i] << "]"; } } cout << endl; cout << "int_arr2: "; needComma = false; for (unsigned int i = 0; i < 10; ++i) { if (int_arr2.ContainsKey(i)) { if (needComma) cout << ", "; needComma = true; cout << "[" << i << ": "; cout << int_arr2[i] << "]"; } } cout << endl << endl; cout << "-------------------------------------------------- Keys Test: " << endl; vector s_indices; s_indices = s_arr2.Keys(); cout << "s_arr2 in_use index: "; needComma = false; for (unsigned int i = 0; i < s_indices.size(); ++i) { if (needComma) cout << ", "; needComma = true; cout << s_indices[i]; } cout << endl; vector i_indices; i_indices = int_arr2.Keys(); cout << "int_arr2 in_use index: "; needComma = false; for (unsigned int i = 0; i < i_indices.size(); ++i) { if (needComma) cout << ", "; needComma = true; cout << i_indices[i]; } cout << endl << endl; cout << "-------------------------------------------------- Clear Test: " << endl; s_arr2.Clear(); int_arr2.Clear(); cout << "s_arr2 : " << s_arr2 << endl; cout << "int_arr2: " << int_arr2 << endl; } int main() { MyDynamicDriver(); cin.get(); return 0; }