Question: Use python a. Write program analyzeSMS(flename) that analyzes word frequencies in real-world text messages. Text fhle SMScollection.txt contains 5574 SMS messages. There is additional information
a. Write program analyzeSMS(flename) that analyzes word frequencies in real-world text messages. Text fhle SMScollection.txt contains 5574 SMS messages. There is additional information about the contents of the fle in the associated "readme" fle readmeSMScollection.txt written by the creators of the dataset. The data was originally from this no-longer-working link: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/. Some information about the data set and its initial investigators is now here. Each line of the fle is represents one SMS/text message. The first item on every line is a label 'ham' or 'spam - indicating whether that line's SMS is considered spam or not. The rest of the line contains the text of the SMS/message. For example: spam Congratsi 1 year special cinema pass for 2 is yours. call 09061209465 nowl Call hamSorry, I'11 call later in meeting. At the end, your program must print summary information and information about the most frequent words in spam messages and the most frequent words in non-spam (ham) messages. It should also compute and compare the average lengths of spam and ham messages. I will not specify exactly what your output should be (but I will demonstrate sample output during the next lecture or two. I will also provide organizational hints and help for each of the parts). To accomplish this, your analyzeSMS function should: e read all of the data from the input file extract individual words from the messages. This should include an effort to get ride of "extras" such as periods, commas, question and exclamation marks, and other characters that aren't part of a word. You should probably also ignore capitalization Thus in the sample spam message above, you probably want to treat "Congrats!" as "congrats" in your frequency analysis build two dictionaries (required for full credit on this assignment), one for frequencies of words appearing in spam messages, one for frequencies of words from ham message print summary information and some word frequency information about the data
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
