Question: Task 2 b - More understanding of Semantic Matching ( 1 5 points ) . Which terms does LSI find similar? To understand why the

Task 2b - More understanding of Semantic Matching (15 points).
Which terms does LSI find similar?
To understand why the LSI-expanded vectors get the results they do, we're going to look at what the operator
U
does to text. In particular, the term-term matrix
U
U
T
tells us the term expansion behavior of this LSI model. Think of the term-term matrix like an operator that first maps a term to the latent space
L
k
(using
U
), then back again from
L
k
to term space (using
U
transpose). The
(
i
,
j
)
entry of
U
U
T
is a kind of association weight between term
i
and term
j
.
Write a function to get the most related terms (according to LSI) for the word "economy". To do this:
Compute the term-term matrix from the matrix U (the reduced_term_matrix variable).
Use the term-term matrix to get the association weights of all words related to the term "economy"
Sort by descending weight value.
Your function should return the top 5 words and their weights as a list of (string, float) tuples.
Do the related terms match your subjective similarity judgment?
TOC
In [19]:
Grade cell: cell-725c6b70431f4779Score: 0.0/0.0(Top)
# Please tell me more!
task_id ="2b"
In [20]:
Student's answer(Top)
def answer_semantic_similarity_b():
result = None
# YOUR CODE HERE
term = "economy"
term_index = tfidf_vectorizer.vocabulary_[term]
#calc term matrix
term_term_matrix = reduced_term_matrix @reduced_term_matrix.T
#get the associated weights of the term
related_terms_weights = term_term_matrix[term_index]
#get the top 5 related terms
top_indices = related_terms_weights.argsort()[::-1][1:6]
top_terms =[(tfidf_feature_names[i], related_terms_weights[i]) for i in top_indices]
return top_terms
#raise NotImplementedError()
return result
In [21]:
# use this cell to explore your solution
# remember to comment the function call before submitting the notebook
# answer_semantic_similarity_b()
In [22]:
Grade cell: cell-683419d1db09c762Score: 0.0/15.0(Top)
print(f"Task {task_id}- AG tests")
stu_ans = answer_semantic_similarity_b()
print(f"Task {task_id}- your answer:
{stu_ans}")
assert isinstance(stu_ans, list), f"Task {task_id}: Your function should return a list. "
assert len(stu_ans)==5, f"Task {task_id}: Your list should contain five elements (the term, score tuples)."
for i, item in enumerate(stu_ans):
assert isinstance(item, tuple), f"Task {task_id}: Your answer at index {i} should be a tuple. "
assert isinstance(
item[0], str
), f"Task {task_id}: The first element of your tuple at index {i} should be a string. "
assert isinstance(
item[1],(float, np.floating)
), f"Task {task_id}: The second element of your tuple at index {i} should be a float. "
# Some hidden tests
del stu_ans

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!