Question: python languaje code please, so basically the code has to read from a file, each line starts with a word and then its 50 numbers
python languaje code please,
so basically the code has to read from a file, each line starts with a word and then its 50 numbers
we have to store the data in a binary tree, SO each word and its 50 numbers have to be stored in a node of the binary tree, the user has to type which binary tree does he prefer its two options AVL and Red black tree.
the it reads another file with 2 words and check similaritites with the formula on the description.
DESCRIPTION:
Word embeddings are a recent advance in NLP that consists of representing words by vectors in such a way that if two words have similar meanings, their embeddings are also similar. See https://nlp.stanford.edu/projects/glove/ for an overview of this interesting research.
In order to work in real-time, NLP systems such as Siri and Alexa need to efficiently retrieve the embeddings given their corresponding words. In this lab, you will implement a simple version of this. The web page mentioned above contains links to files that contain word embeddings of various lengths for various vocabulary sizes. Use the file glove.6B.50d.txt which contains word embeddings of length 50 for a very large number of words. Each line in the file starts with the word being described, followed by 50 floating point numbers that represent the words vector description (the embedding). The words are ordered by frequency of usage, so the is the first word.
Your task for this lab is to write a program that does the following: 1. Read the file glove.6B.50d.txt and store each word and its embedding in a binary search tree. Ask the user what type of binary search tree he/she wants to use (AVL Tree or Red-Black Tree). You are free to use the implementation provided in your zyBook for these two types of trees. Adapt zyBooks code to include the word and its embedding and use the word as key. Ignore the words in the file that do not start with an alphabetic character (for example , and .). 2. Read another file containing pairs of words (two words per line) and for every pair of words find and display the similarity of the words (see example in appendix). To find the similarity of words w0 and w1 , with embeddings e0 and e1 , we use the cosine distance, which ranges from -1 to 1, given by: sim(w0 , w1 ) = e0e1 // |e0 | |e 1|
Where e0e1 is the dot product of e0 and e1.. And |e0 | and |e 1| are the magnitudes of
e0 and e1 . Look-up these formulas online if necessary.
Write the following methods to extract information from the tree. (a) Compute the number of nodes in the tree. (b) Compute the height of the tree. (c) Generate a file containing all the words stored in the tree, in ascending order, one per line. (d) Given a desired depth, generate a file with all the keys that have that depth, in ascending order. Recall that the root has depth zero, its children have depth one, and so on. As usual, write a report describing your work. Determine the O () running times of your methods and show tables illustrating their actual running times and discuss disagreements between theoretical and experimental results.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
