Question: This is all one assignment COM 301 Lab 2: Implementing Indexing Using B-Trees An index is a file that facilitates insert, delete, and search operations

 This is all one assignment COM 301 Lab 2: Implementing Indexing
Using B-Trees An index is a file that facilitates insert, delete, and
search operations on a database. The efficiency of these operations depends on
the underlying structure of the index. An example of an index that
ensures operations of (log n) time complexity is the B-tree. We've already
established that a B-tree of order m is always height-balanced-leaves are always
This is all one assignment

COM 301 Lab 2: Implementing Indexing Using B-Trees An index is a file that facilitates insert, delete, and search operations on a database. The efficiency of these operations depends on the underlying structure of the index. An example of an index that ensures operations of (log n) time complexity is the B-tree. We've already established that a B-tree of order m is always height-balanced-leaves are always at the same level. In addition, its internal nodes (except perhaps the root) have between (m/2] and m children, to keep the space overhead of the index reasonable. Assignment: For this assignment, you will index a database of student records using B-trees. There will be two index files as the database will be indexed both by student ID and student last name. A separate database file will contain the actual student records. The program will take two input files: a data file and a command file. The data file is a text file containing student records from which the database and its indexes (binary files) will be initially built. The command file is a list of insert and search operations to be carried out after the files have been built. The results of these operations will be sent to an output file. A name will be provided (through the command line) that is used to obtain the names of the database and index files. For example, if the name provided is "student," the database will be named "student dat while the two index files will be named "student.ix1" (for the ID index) and "student.ix2" (for the name index). The order (m) of the B-trees will also be specified (the same for both indexes) in the command line Program Invocation: The program is invoked as follows: btree If the command line contains an insufficient number of arguments or if any of the specified input files do not exist, the program should print an appropriate error message and exit. Input and Output Formats: The data file will consist of some number of text lines, one student record per line. Each line will have 6 space-separated fields: ID (9 digits), last name (at most 15 letters), first name (at most 15 letters), year (1 digit), major (at most 4 letters), and email address (at most 20 characters). This means a record will require at most 64 bytes if stored as text fields. The command file will contain lines specifying one of five kinds of commands: find ID find name add dump by ID dump by name The first two commands will output the record(s) found, one line per record. Note that the first command returns at most one record (output "NOT FOUND" if no such record exists), while the second command may return more than one record. (You may assume that IDs are unique and that last names may duplicate). For the third command, the program should output STUDENT ADDED. (; ; ), where is the ID associated with the record added, is the byte position in the database file where the record was added, and and are counts of splits performed during the insert operation. The count for the ID index is , while the count for the name index is. The last two commands list ALL records in their indicated order. For all the five commands, echo the command line itself before you echo the output results described above. For all input files, the fields are separated by one or more spaces and there may be leading or trailing spaces in a line. You may assume, of course, that the fields themselves may not contain spaces (e.g., is one word with no spaces in between). When printing out a record, print the byte position of that record in the database file, followed by a colon and 69 (ASCII) characters describing the record: each field is printed with its maximum number of characters padded with trailing spaces if necessary. Add a single space between each field-64 characters for the fields plus 5 separator spaces makes 69 characters total. Finally, output the number of records in the database file and the number of B-tree nodes in the index files before and after processing the command file, using the following format: NUMBER OF RECORDS: , ID INDEX: nodes, NAME INDEX: nodes. Use of the C++ Standard Template Library (STL) is limited; in general, you should avoid using such classes for this project, but ask your instructor during class, in case you find it reasonable to use a particular STL class, while staying within the objectives of this assignment. The point of this project is to implement the B-Tree data structure from scratch. Sample Input and Output Files: datafile 1.txt 000000088 Platt John 0 ML jplatt@vt.edu 000000097 Brill Eric 1 NLP brill@vt.edu 000000056 Dumais Susan 2 IR dumais@aol.com 000000003 Chen Zheng 1 CN zhengc@msn.com cmdfile1.txt find ID 000000097 findnamePlatt findnameChen add 000000060 Shaffer Clifford 1 ALG shaffer@vt.edu add 000000066 Platt Jenny 1 UNK platt@vt.edu add 000000021 Chen Jian 0 SE jchen@vt.edu dump by ID dump by name add 000000098 Chen Xiao 2 XX cx@yahoo.com find name Vergara When the following command is invoked: btree datafile1.txt student1 3 cmdfile1.txt outfile1.txt ... the program will create one database file named "student1.dat" and two index files named "student1.ix1" and "student1.ix2". It will also produce the following output file: outfile1.txt NUMBER OF RECORDS: 5, ID INDEX: 4 nodes, NAME INDEX: 4 nodes. COMMAND: find ID 000000097 64:000000097Brill Eric 1NLP brill@vt.edu COMMAND: find name Platt 0:000000088Platt John OML jplatt@vt.edu Use of the C++ Standard Template Library (STL) is limited; in general, you should avoid using such classes for this project, but ask your instructor during class, in case you find it reasonable to use a particular STL class, while staying within the objectives of this assignment. The point of this project is to implement the B-Tree data structure from scratch. Sample Input and Output Files: datafile 1.txt 000000088 Platt John 0 ML jplatt@vt.edu 000000097 Brill Eric 1 NLP brill@vt.edu 000000056 Dumais Susan 2 IR dumais@aol.com 000000003 Chen Zheng 1 CN zhengc@msn.com cmdfile1.txt find ID 000000097 findnamePlatt findnameChen add 000000060 Shaffer Clifford 1 ALG shaffer@vt.edu add 000000066 Platt Jenny 1 UNK platt@vt.edu add 000000021 Chen Jian OSE jchen@vt.edu dump by ID dump by name add 000000098 Chen Xiao 2 XX cx@yahoo.com find name Vergara When the following command is invoked: btree datafile1.txt student1 3 cmdfile 1.txt outfile1.txt ..the program will create one database file named "student 1.dat' and two index files named "student 1.ix1" and "student1.ix2". It will also produce the following output file: outfile1.txt NUMBER OF RECORDS: 5, ID INDEX: 4 nodes, NAME INDEX: 4 nodes. COMMAND: find ID 000000097 64:000000097Brill Eric 1NLP brill@vt.edu COMMAND: find name Platt 0:000000088Platt John OML jplatt@vt.edu COMMAND: add 000000021 Chen Jian 0 CSE jchen@vt.edu STUDENT 000000021 ADDED (448;0;0) COMMAND: dump by ID 192:000000003Chen Zheng 1CN zhengc@msn.com 256:000000010Chen Ming OISE mcheng@vt.edu 448:000000021 Chen Jian OSE jchen@vt.edu 128:000000056Dumais Susan 2IR dumais@aol.com 320:000000060Shaffer Clifford 1ALG shaffer@vt.edu 384.000000066Platt Jenny 1UNK platt@vt.edu 0:000000088Platt John OML jplatt@vt.edu 64:000000097 Brill Eric 1NLP brill@vt.edu COMMAND: dump by name 564:000000097Brill Eric 1NLP brill@vt.edu 192:000000003Chen Zheng 1CN zhengc@msn.com 256:000000010Chen Ming OISE mcheng@vt.edu 448:000000021 Chen Jian OcSE jchen@vt.edu 128:000000056Dumais Susan 21R dumais@aol.com 0:000000088Platt John OML jplatt@vt.edu 384:000000066Platt Jenny 1UNK platt@vt.edu 320:000000060Shaffer Clifford 1ALG shaffer@vt.edu COMMAND: add 000000098 Chen Xiao 2 XX cx@yahoo.com STUDENT 000000098 ADDED (512;0:1) COMMAND: find name Vergara NOT FOUND NUMBER OF RECORDS: 9, ID INDEX: 7 nodes, NAME INDEX: 8 nodes. Program Submission and Other Deliverables: You are required to submit the output file from your program. Submit Lab 2 to the Assignment box no later than Sunday 11:59 PM EST/EDT. COMMAND: add 000000021 Chen Jian 0 CSE jchen@vt.edu STUDENT 000000021 ADDED (448;0;0) COMMAND: dump by ID 192:000000003Chen 256:000000010Chen 448:000000021 Chen 128:000000056Dumais 320:000000060Shaffer 384.000000066Platt 0:000000088Platt 64:000000097Brill Zheng Ming Jian Susan Clifford Jenny John Eric 1CN OISE OSE 2IR 1ALG 1UNK zhengc@msn.com mcheng@vt.edu jchen@vt.edu dumais@aol.com shaffer@vt.edu platt@vt.edu jplatt@vt.edu brill@vt.edu OML 1NLP COMMAND: dump by name 564:000000097 Brill Eric 192:000000003Chen Zheng 256:000000010Chen Ming 448:000000021 Chen Jian 128.000000056Dumais Susan 0:000000088Platt John 384:000000066Platt Jenny 320:000000060Shaffer Clifford 1NLP 1CN OISE OcSE 2IR OML 1UNK 1ALG brill@vt.edu zhengc@msn.com mcheng@vt.edu jchen@vt.edu dumais@aol.com jplatt@vt.edu platt@vt.edu shaffer@vt.edu COMMAND: add 000000098 Chen Xiao 2 XX cx@yahoo.com STUDENT 000000098 ADDED 2) (512;0:1) COMMAND: find name Vergara NOT FOUND NUMBER OF RECORDS: 9, ID INDEX: 7 nodes, NAME INDEX: 8 nodes. Program Submission and Other Deliverables: You are required to submit the output file from your program. Submit Lab 2 to the Assignment box no later than Sunday 11:59 PM EST/EDT

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!