Question: Python code I need help with programming the highlighted instructions. Or at least how it should be done to get the same result the instructions
Objective: Use Python and NLTK to process text. Turn in: Your Python.py file Instructions: 1. Read in moby_dick.txt which you can download from 2. riuss the text a. use a string function to replace all occurrences of b. use regex to remove all digits c. use regex to replace punctuation with a single space 3. Tokenize the text and print the number of tokens. 4. Create a set of unique tokens and print the number of 5. Calculate and print the lexical diversity, which is the Format the number with commas. unique tokens, formatted as above. number of unique tokens divided by the number of tokens. Format the number for floating point. 6. Create a list of important words by removing stop words from the unique tokens list. Display the number of important words, formatted. 7. Using the list of important words, create a list of tuples of the word and stemmed word, like this [(remarkably','remark'), (prevented', prevent') 8. Create a dictionary where the key is the stem and the value is a list of words with that stem, like this: achiev': [achieved', 'achieve] accident': [accidentally, 'accidental']... \ 9. Print the number of dictionary entries. 10. For the 25 dictionary entries with the longest lists, print the stem and its list. Hint on sorting dict by length of values: for k in sorted(stem dict, key-lambda k: len(stem dict[k]), reverse- True): 11. Perform POS tagging on the words in the top 25 dictionary entries. 12. Create a dictionary of POS counts where the key is the POS and the count is the number of these words with that POS. Print the dictionary Objective: Use Python and NLTK to process text. Turn in: Your Python.py file Instructions: 1. Read in moby_dick.txt which you can download from 2. riuss the text a. use a string function to replace all occurrences of b. use regex to remove all digits c. use regex to replace punctuation with a single space 3. Tokenize the text and print the number of tokens. 4. Create a set of unique tokens and print the number of 5. Calculate and print the lexical diversity, which is the Format the number with commas. unique tokens, formatted as above. number of unique tokens divided by the number of tokens. Format the number for floating point. 6. Create a list of important words by removing stop words from the unique tokens list. Display the number of important words, formatted. 7. Using the list of important words, create a list of tuples of the word and stemmed word, like this [(remarkably','remark'), (prevented', prevent') 8. Create a dictionary where the key is the stem and the value is a list of words with that stem, like this: achiev': [achieved', 'achieve] accident': [accidentally, 'accidental']... \ 9. Print the number of dictionary entries. 10. For the 25 dictionary entries with the longest lists, print the stem and its list. Hint on sorting dict by length of values: for k in sorted(stem dict, key-lambda k: len(stem dict[k]), reverse- True): 11. Perform POS tagging on the words in the top 25 dictionary entries. 12. Create a dictionary of POS counts where the key is the POS and the count is the number of these words with that POS. Print the dictionary
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
