Question: The purpose of the work: to develop a program in the programming language Python, processing texts and defining their similarity in the distance of Jacquard

The purpose of the work: to develop a program in the programming language Python, processing texts and defining their similarity in the distance of Jacquard (Jacquard). Input: 50 texts of 10 sentences each. Output: thermal map of similarity (heatmap) texts 50 by 50. It is recommended to use matplotlib + seaborn libraries.

Main requirements: 1. Perform tokenization (partition of texts into words), normalization (normalization of words to normal form) and filtering from words that lower the accuracy of results (particles, prepositions, conjunctions, interjections, etc.), as well as representation of all words in lowercase (lowercase) and getting rid of punctuation marks. You can use third-party libraries to implement this paragraph. 2. Evaluate the similarity of texts across the Jacquard distance. The algorithm student should implement independently. 3. Form a matrix (table) of correlation with values from 0 (no correlation) to 1 (the maximum similarity is always present on the main diagonal of the matrix). 4. By visualizing the thermal map, it should be clear which texts and where are on the map (image comment, signed axes, etc.)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!