Question: I have the following code and need to modify it with this solution in python please import numpy as np import pandas as pd #Import

I have the following code and need to modify it with this

solution in python please

import numpy as np import pandas as pd #Import TfIdfVectorizer from the scikit-learn library from sklearn.feature_extraction.text import TfidfVectorizer # Import cosine_similarity to compute the dot product from sklearn.metrics.pairwise import cosine_similarity #Import data from the clean file df = pd.read_csv('ccai422_lab05_part1_1_data.csv') #Print the head of the cleaned DataFrame df.head() #Import the original file orig_df = pd.read_csv('ccai422_lab05_part1_2_data.csv', low_memory=False) #Add the useful features into the cleaned dataframe df['overview'], df['id'] = orig_df['overview'], orig_df['id'] orig_df.head()

#Define a TF-IDF Vectorizer Object. Remove all english stopwords tfidf = TfidfVectorizer() #Replace NaN with an empty string df['overview'] = df['overview'].fillna('') #Construct the required TFIDF matrix by applying the fit_transform method on the overview feature tfidf_matrix = tfidf.fit_transform(df['overview']) #tfidf_matrix = tfidf.fit_transform(corpus) #Output the shape of tfidf_matrix tfidf_matrix.shape # The slicing is due to the lack of enough memory. Only use it if your mach tfidf_matrix = tfidf_matrix[:1000] # Compute the cosine similarity matrix cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix) #Construct a reverse mapping of indices and movie titles, and drop duplicate titles, if any indices = pd.Series(df.index, index=df['title']).drop_duplicates() indices.head() # Function that takes in movie title as input and gives recommendations def content_recommender(title, cosine_sim=cosine_sim, df=df, indices=indices): # Obtain the index of the movie that matches the title idx = indices[title] # Get the pairwsie similarity scores of all movies with that movie # And convert it into a list of tuples as described above sim_scores = list(enumerate(cosine_sim[idx])) # Sort the movies based on the cosine similarity scores sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) # Get the scores of the 10 most similar movies. Ignore the first movie. sim_scores = sim_scores[1:11] # Get the movie indices movie_indices = [i[0] for i in sim_scores] # Return the top 10 most similar movies return df['title'].iloc[movie_indices] print(content_recommender('The Lion King'))

I have the following code and need to modify it with this

the function in two ways: one through a library and the other one without a library. In this part, you are asked to use your implementation of TFIDF with the dataset Then, complete the following: 1. Modify your TFIDF implementation to compute the term frequency based on the row frequency instead of the relative row frequency (i.e. do not consider the number of words per document) 2. Compute the TFIDF representation of the movies based on the overview field 3. Compute the cosine similarity 4. Recommend a movie to a user who likes 'The Lion King

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Overview and Requirements For this programming assignment, we are going to investigate how much "work" different sorting routines do, based on the input size and order of the data. We will record the...

Here is a simple definition of data science: Data science combines multiple fields including statistics, scientific methods, and data analysis to extract value from data. Those who practice data...

I'm running into problems with questions following this initial question. I don't know where I'm getting it wrong. And I also don't know how to print the correlation coefficient for the data. This is...

Can you also explain how to call P1 from P2 and use the functions created in P1 in P2. P1 Make use of the scikit-learn (sklearn) python package in your function implementations Complete the Following...

P1 Make use of the scikit-learn (sklearn) python package in your function implementations Complete train_test_split function Using the train_test_split function from sklearn implement a function that...

Step 4: Hypothesis Test for the Population Mean (II)A team averaging 110 points is likely to do very well during the regular season. The coach of your team has hypothesized that your team scored at...

Hello, I am a bit stuck on my assignment this week. I believe I have figured out steps 1-3. I am a bit stuck on 4-6. Any help would be appreciated. " This notebook contains the step-by-step...

Using PYTHON CODE I need help with step 4. Step 4: Hypothesis Test for the Population Mean (II) A team averaging 110 points is likely to do very well during the regular season. The coach of your team...

Project One: Data Visualization, Descriptive Statistics, Confidence Intervals This notebook contains the step-by-step directions for Project One. It is very important to run through the steps in...

please, need help with this one. thank you in advance. jupyter, python, numpy HW3-3: Netflix Subscription Which countries pay the most and least for Netflix in 2021? You can access the Netflix...

A hot-rolled AISI 1212 steel is given 20 percent cold work. Determine the new values of the yield and ultimate strengths

On April 1, 2013, the Apex Corporation sold a parcel of underdeveloped land to the Applegate Construction Company for $2,400,000. The book value of the land on Apex's books was $480,000. Terms of the...

The following companies have different financial statistics. Turtle Co . Hare Corp.Growth rate in sales and earnings 5 % 2 0 % Cash as a percentage of total assets 1 5 4 a . What dividend policy...

last two options for the multiple choice are : performance management development A construction equipment manufacturer, Roswell Corporation, is focusing on becoming a leader in sustainability in...

(Appendices) COST OF PURCHASES. Compass, Inc., purchased 1,000 bags of insulation from Glassco, Inc. The bags of insulation cost $4.25 each. Compass paid Turner Trucking $260 to have all 1,000 bags...

(Appendices) PURCHASES AND PURCHASES RETURNS. On November 6, Lubin Products purchased on credit 350 parts kits from Michaels Electronics for $38 per kit. Michaels paid $320 to have the kits shipped...

(Appendices) TERMS OF SHIPMENT AND RECORDING PURCHASES. On May 12, Digital Distributors received three shipments of merchandise. The first was shipped F.0.B. shipping point, had a total invoice price...