Question: python languaje code please, so basically the code has to read from a file, each line starts with a word and then its 50 numbers

python languaje code please,

so basically the code has to read from a file, each line starts with a word and then its 50 numbers

we have to store the data in a binary tree, SO each word and its 50 numbers have to be stored in a node of the binary tree, the user has to type which binary tree does he prefer its two options AVL and Red black tree.

the it reads another file with 2 words and check similaritites with the formula on the description.

DESCRIPTION:

Word embeddings are a recent advance in NLP that consists of representing words by vectors in such a way that if two words have similar meanings, their embeddings are also similar. See https://nlp.stanford.edu/projects/glove/ for an overview of this interesting research.

In order to work in real-time, NLP systems such as Siri and Alexa need to efficiently retrieve the embeddings given their corresponding words. In this lab, you will implement a simple version of this. The web page mentioned above contains links to files that contain word embeddings of various lengths for various vocabulary sizes. Use the file glove.6B.50d.txt which contains word embeddings of length 50 for a very large number of words. Each line in the file starts with the word being described, followed by 50 floating point numbers that represent the words vector description (the embedding). The words are ordered by frequency of usage, so the is the first word.

Your task for this lab is to write a program that does the following: 1. Read the file glove.6B.50d.txt and store each word and its embedding in a binary search tree. Ask the user what type of binary search tree he/she wants to use (AVL Tree or Red-Black Tree). You are free to use the implementation provided in your zyBook for these two types of trees. Adapt zyBooks code to include the word and its embedding and use the word as key. Ignore the words in the file that do not start with an alphabetic character (for example , and .). 2. Read another file containing pairs of words (two words per line) and for every pair of words find and display the similarity of the words (see example in appendix). To find the similarity of words w0 and w1 , with embeddings e0 and e1 , we use the cosine distance, which ranges from -1 to 1, given by: sim(w0 , w1 ) = e0e1 // |e0 | |e 1|

Where e0e1 is the dot product of e0 and e1.. And |e0 | and |e 1| are the magnitudes of

e0 and e1 . Look-up these formulas online if necessary.

Write the following methods to extract information from the tree. (a) Compute the number of nodes in the tree. (b) Compute the height of the tree. (c) Generate a file containing all the words stored in the tree, in ascending order, one per line. (d) Given a desired depth, generate a file with all the keys that have that depth, in ascending order. Recall that the root has depth zero, its children have depth one, and so on. As usual, write a report describing your work. Determine the O () running times of your methods and show tables illustrating their actual running times and discuss disagreements between theoretical and experimental results.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Write c++ programming!! districts.txt Barryland,1,5,7 Rabbitville,1,55,12,2,654,0,3,79,711 Jelly Bean Forest,1,11,49,2,337,99,3,764,64091,4,79666,22278,5,116364,56350 Earth,1,0,1,2,45,67 New...

0. Introduction. For this project, you will write a Python class called Mumble that generates random strings which look something like English nonsense words. For example, a version of Mumble...

Problem 2. Processing DNA in a File (Page 93) The text for the problem is shown below) The file input.txt contains a number of DNA sequences, one per line. Each sequence starts with the same 14 base...

14.3 (Count the occurrences of each keyword) Write a program that reads in a Python source code file and counts the occurrence of each keyword in the file. Your program should prompt the user to...

Python CMSC 206 URGENT Can you fix this code, please??? # Prompt the user to enter the name of the file to be processed fname = input("Enter a file name: ") # Open the file in read mode and create a...

Write a complete c++ programming on main.cpp file, screenshot the code and explanation on each code, otherwise willl downvote! main.cpp Text Files : districts.txt Barryland,1,5,7...

14.3 (Count the occurrences of each keyword) Write a program that reads in a Python source code file and counts the occurrence of each keyword in the file. Your program should prompt the user to...

Project Assignment Willy Wonka's Chocolate Factory Due date: 23:59, 6 January 2023 No late submissions are accepted. Problem 1: Productivity You are working as an industrial engineer responsible of...

Using Python I can't figure it out the rest of the problem. What am I doing wrong on the one I'm getting errors on? These are the processes I created in the script. This is what I'm supposed to do in...

Divide the class into groups (a maximum of six), and have each group select one of the following situations. Decide how you would respond. Be prepared to justify your position in a class discussion...

Barry consumes tacos and pepsi. The price of tacos is initially $4.00 per taco and the price of pepsi is $1.00 per can and Bary has $20.00 to spend. The marginal utility that each good yields is...

Which of the following would increase a company's need for external finance, all else equal? Multiple Choice None of the options are correct. An increase in profit margin A decrease in the collection...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

15-2 What are the alternative strategies for developing global businesses?

3. Ask exactly how you would be using Microsoft Office tools in your job.

15-1 What major factors are driving the internationalization of business?