Question: We will study a political blog dataset first compiled for the paper Lada A . Adamic and Natalie Glance, The political blogosphere and the 2

We will study a political blog dataset first compiled for the paper Lada A

.

Adamic and Natalie Glance,

The political blogosphere and the

2004

US Election

,

in Proceedings of the WWW

- 2005

Workshop on the

Weblogging Ecosystem

(2005) .

It is assumed that blog

-

site with the same political orientation are more

likely to link to each other, thus, forming a

community

cluster

in a graph. In this question, we will

see whether or not this hypothesis is likely to be true based on the data.

The dataset nodes.txt contains a graph with n

= 1490

vertices

(

nodes

)

corresponding to political

blogs.

The dataset edges.txt contains edges between the vertices. You may remove isolated nodes

(

nodes

that are not connected to any other nodes

)

in the pre

-

processing.

We will treat the network as an undirected graph; thus, when constructing the adjacency matrix, make

it symmetrical by

,

.

.,

set the entry in the adjacency matrix to be one whether there is an edge between

the two nodes

(

in either direction

) .

In addition, each vertex has a

0 - 1

label

(

in the

3

rd column of the data file

)

corresponding to the true

political orientation of that blog. We will consider this as the true label and check whether spectral clustering

will cluster nodes with the same political orientation as possible.

1 . (5

points

)

Use spectral clustering to find the k

= 2, 5, 10, 30, 50

clusters in the network of political blogs

(

each node is a blog, and their edges are defined in the file edges.txt

) .

Find majority labels

(

Same as

purity score from the image compression problem

)

in each cluster for different k values, respectively.

For example, if there are k

= 2

clusters, and their labels are

{0, 1, 1, 1}

and

{0, 0, 1}

then the majority

label for the first cluster is

1

and for the second cluster is

0 .

It is required you implement the

algorithms yourself rather than calling from a package.

4

Now compare the majority label with the individual labels in each cluster, and report the mismatch

rate

(

Also known as misclassification rate

)

for each cluster, when k

= 2, 5, 10, 30, 50 .

For instance, in

the example above, the mismatch rate for the first cluster is

1 / 4 (

only the first node differs from the

majority

),

and the second cluster is

1 / 3 .

2 . (5

points

)

Tune your k and find the number of clusters to achieve a reasonably small mismatch rate.

Please explain how you tune k and what is the achieved mismatch rate. Please explain intuitively what

this result tells about the network community structure.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

I stuck in the Spatial Clustering Python. This is the question: Political blogs dataset We will study a political blog dataset first compiled for the paper Lada A. Adamic and Natalie Glance, "The...

On Matlab 3) Partitioning a network of US political blogs. In this problem we will study a network of Internet blogs on the subject of US politics, with the goal of partitioning the graph into...

3) Partitioning a network of US political blogs. In this problem we will study a network of Internet blogs on the subject of US politics, with the goal of partitioning the graph into liberal (i.e.,...

** 1) Review the following: READING 1: Johnson, K. (2019). YouTube recommendation algorithm audit uncovers paths to radicalization. YouTube recommendation algorithm audit uncovers paths to...

Short Run Stock Overreaction: Evidence from Bursa Malaysia NORLI ALIa*, ANNUAR MD NASSIRb , TAUFIQ HASSANc AND SAZALI ZAINAL ABIDINd a*Universiti Teknologi MARA b,c,dUniversiti Putra Malaysia...

i need help answering question based on information. this is the data and im needing to answer the question AGE GENDER (1=male, 2 female INCOME (1-low, 2-medium 3-high) SATISFACTION POLITICAL PARTY...

In a strategic game, if the other player has adopted a Nash equilibrium strategy, you should a. also adopt a Nash equilibrium strategy. b. use a strategy that delivers you a higher payoff than the...

Ray the owner of a small company. asked Holmes. a CPA. to conduct an audit of the company's records. Ray told Holmes that an audit was to be completed in time to submit audited financial statements...

Luckna's Hardware paid R 1 2 0 0 0 for insurance for the next 1 2 months beginning on October 1 , 2 0 2 3 . What is the correct adjusting entry as of December 3 1 , 2 0 2 3 ? A . Debit Prepaid...

QUESTION 1 8 Sam threatened to harm Alecia's daughter if she did not agree to help him rob a convenience store. It is likely that a jury would find Alecia not guilty of a crime in this case because a...

Think about diversity and inclusion experience at the workplace in your cultural context. Based on your personal experience during the past two years, share one thing that you consider should have...

Understand how relocating headquarters functions from big cities to rural areas influences the work styles and employees work-life balance.

In your view, do you think leadership can influence HRM practices such as recruitment, selection, training, and career development in promoting gender diversity and inclusion in an organization?