Question: Python: Your Task: Define get _ top _ tokens as follows: Given a cluster, identify its most frequent tokens. Inputs: cid: The ID of a

Python:

Your Task: Define get

_

top

_

tokens as follows:

Given a cluster, identify its most frequent tokens.

Inputs:

cid: The ID of a cluster to analyse

labels: cluster assignments where document i was assigned to cluster labels

[

] .

corpusdf: A corpus dataframe having the columns

'

',

'title' and 'pesudodoc'.

k: the number of tokens to return

Return: top

_

tokens, a pythonset of the k most frequent tokens

Steps:

*)

Use labels to identify the documents

*)

select thier corresponding pesudo

-

documents "corpusdf"

*)

For each unique token, count the number of pesudo

-

documents in which it appeared

*)

Return the k most frequently occuring tokens as a python set. In the case of ties, consider tokens in ascending order

Other notes:

*)

to match a document ID i to its pseudo

-

document, note that i is the index value of corpusdf

(

and

,

in particular, not the

'

'

column!

)

*)

if there are fewer than k unique tokens, return as many as available

*)

if there are ties, use the token itself sort the tokens by name

Example: for the demo code, a correct implementation should return:

{'

page

',

'reviews', 'academic', 'book'

}

Solution##

def get

_

top

_

tokens

(

cid: int, labels:np

.

ndarray, corpusdf: pd

.

DataFrame, k

= 10) - >

set:

## code

with open

('

resource

/

asnlib

/

publicdata

/

demo

_

args

_

get

_

top

_

tokens.dill',

'

')

as fp:

demo

_

cid, demo

_

labels, demo

_

corpusdf, demo

_

=

dill.load

(

)

(

get

_

top

_

tokens

(

demo

_

cid, demo

_

labels, demo

_

corpusdf, demo

_

))

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

This is a programming exercise. You will create functions that process a corpus of text files to collect statistics, build a simple index as postings, and support queries. For the main analysis you...

Lesson 12 Quiz (Show/Explain all Work) IST 230 Relations on Sets, Databases 1. Let A = {0, 1, 2, 3, 4, 5, 6, 7, 8} and B = {1, 2, 3, 4, 5, 6, 7, 8}. Now let R be a binary relation R from A to B such...

Describing Data Once we have collected data from surveys or experiments, we need to summarize and present the data in a way that will be meaningful to the reader. We will begin with graphical...

Python Programming, Help please Run C Code Problem Description: Consider that you and your friend are visiting a building that has 15 floors. Assume that you both are going to play a game where your...

Python Problem-3 (Total points: 25) Problem Description: Consider that you and your friend are visiting a building that has 15 floors. Assume that you both are going to play a game where your friend...

System Specification The assignment for this module is to implement a movie record system in Python. You are asked to build a system by using object-oriented programming concepts. It should be...

It's divided in 2 parts: Part 1: Summarize all of it in 1 page (Use word or docs for reference) Part 2: Answer Short question after summarizing all. This would help a lot, thank you! Part 1:...

program with Phython on Sypder you will create a Python program that implements Prim's algorithm to find a Minimal Weight Spanning tree for a weighted graph G. Attached to this assignment, you will...

(a) If f is an odd function, show that C0 = c2 = c4 = ... = 0 (b) If f is an even function, show that C1 = c3 = c5 = ... = 0

Given the functions, f(x) = 6x + 2 and g(x) = x-7, perform the indicated operation. When applicable, state the domain restriction. 60)

The gain from the sale of shares of CFC is always capital in nature to the U . S . shareholders. Group of answer choices True False

Juicy Lemonade Company The Juicy Lemonade Company manufactures premium flavored organic lemonade. Management is ready to close the books for the end of the first quarter in 2020 and your supervisor...