Question: I have the following code and need to modify it with this solution in python please import numpy as np import pandas as pd #Import
I have the following code and need to modify it with this
solution in python please
import numpy as np import pandas as pd #Import TfIdfVectorizer from the scikit-learn library from sklearn.feature_extraction.text import TfidfVectorizer # Import cosine_similarity to compute the dot product from sklearn.metrics.pairwise import cosine_similarity #Import data from the clean file df = pd.read_csv('ccai422_lab05_part1_1_data.csv') #Print the head of the cleaned DataFrame df.head() #Import the original file orig_df = pd.read_csv('ccai422_lab05_part1_2_data.csv', low_memory=False) #Add the useful features into the cleaned dataframe df['overview'], df['id'] = orig_df['overview'], orig_df['id'] orig_df.head()
#Define a TF-IDF Vectorizer Object. Remove all english stopwords tfidf = TfidfVectorizer() #Replace NaN with an empty string df['overview'] = df['overview'].fillna('') #Construct the required TFIDF matrix by applying the fit_transform method on the overview feature tfidf_matrix = tfidf.fit_transform(df['overview']) #tfidf_matrix = tfidf.fit_transform(corpus) #Output the shape of tfidf_matrix tfidf_matrix.shape # The slicing is due to the lack of enough memory. Only use it if your mach tfidf_matrix = tfidf_matrix[:1000] # Compute the cosine similarity matrix cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix) #Construct a reverse mapping of indices and movie titles, and drop duplicate titles, if any indices = pd.Series(df.index, index=df['title']).drop_duplicates() indices.head() # Function that takes in movie title as input and gives recommendations def content_recommender(title, cosine_sim=cosine_sim, df=df, indices=indices): # Obtain the index of the movie that matches the title idx = indices[title] # Get the pairwsie similarity scores of all movies with that movie # And convert it into a list of tuples as described above sim_scores = list(enumerate(cosine_sim[idx])) # Sort the movies based on the cosine similarity scores sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) # Get the scores of the 10 most similar movies. Ignore the first movie. sim_scores = sim_scores[1:11] # Get the movie indices movie_indices = [i[0] for i in sim_scores] # Return the top 10 most similar movies return df['title'].iloc[movie_indices] print(content_recommender('The Lion King'))

the function in two ways: one through a library and the other one without a library. In this part, you are asked to use your implementation of TFIDF with the dataset Then, complete the following: 1. Modify your TFIDF implementation to compute the term frequency based on the row frequency instead of the relative row frequency (i.e. do not consider the number of words per document) 2. Compute the TFIDF representation of the movies based on the overview field 3. Compute the cosine similarity 4. Recommend a movie to a user who likes 'The Lion King
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
