Question: The purpose of the work: to develop a program in the programming language Python, processing texts and defining their similarity in the distance of Jacquard
The purpose of the work: to develop a program in the programming language Python, processing texts and defining their similarity in the distance of Jacquard (Jacquard). Input: 50 texts of 10 sentences each. Output: thermal map of similarity (heatmap) texts 50 by 50. It is recommended to use matplotlib + seaborn libraries.
Main requirements: 1. Perform tokenization (partition of texts into words), normalization (normalization of words to normal form) and filtering from words that lower the accuracy of results (particles, prepositions, conjunctions, interjections, etc.), as well as representation of all words in lowercase (lowercase) and getting rid of punctuation marks. You can use third-party libraries to implement this paragraph. 2. Evaluate the similarity of texts across the Jacquard distance. The algorithm student should implement independently. 3. Form a matrix (table) of correlation with values from 0 (no correlation) to 1 (the maximum similarity is always present on the main diagonal of the matrix). 4. By visualizing the thermal map, it should be clear which texts and where are on the map (image comment, signed axes, etc.)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
