Question: 3 Exercise Three - 40 points The input for this exercise is the file review.txt provided with the instructions. In one single matlab script file

3 Exercise Three - 40 points The input for this exercise is the file review.txt provided with the instructions. In one single matlab script file (exercise02.m), implement the following: 1. Read in the lines of the text file using the fgetl function. Store the lines of this file in a cell array. Useful functions: for, fopen, fgetl. 2. Preprocessing includes: remove punctuation, convert to lower case, remove stop words. [you can do this before or after tokenizing the string, see funs lowerO for lower case conversion] 3. Parse your stored lines of text into their constituent words using the strtok function. Store all of the words in the entire document in a cell array with one word per index. 4. Create a lexicon consisting of all of the unique words in all the files. Useful function: unique. 5. Create a column vector representing how many times each lexicon word occurs in the document. This is a word vector representation for the document. Useful function: zeros. 6. Prune the column vector leaving out words that happen only very few times, the threshold can be 1,2,..5. You make the choice that seems reasonable based on the amount of pruning afforded by each choice. 7. Capture the output with and without pruning into the results.doc file
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
