Question: Part one: Cluster analysis In this section we will only use the Synthea observations data. Start by using the code below to clear the R

 Part one: Cluster analysis In this section we will only usethe Synthea observations data. Start by using the code below to clearthe R global environment, load the observations data, make a data objectwith a small number of variables and remove all missing values, removepatients who did not have COVID 19, truncate the variable names, andload the package factoextra which we use for k-means cluster analysis. Asalways you will need to set your working directory. rm(list=1s()) obs datatable(cluster=clust$cluster,dead=data$dead) dead cluster ] 1 1415 4 2 310 273 Question 5(difficult - 2 marks) Some of the pathology variables were not relatedto cluster membership (i.e. the mean levels were very similar across clusters).Choose the two pathology tests which were the least related to cluster

Part one: Cluster analysis In this section we will only use the Synthea observations data. Start by using the code below to clear the R global environment, load the observations data, make a data object with a small number of variables and remove all missing values, remove patients who did not have COVID 19, truncate the variable names, and load the package factoextra which we use for k-means cluster analysis. As always you will need to set your working directory. rm(list=1s()) obs data table(cluster=clust$cluster,dead=data$dead) dead cluster ] 1 1415 4 2 310 273 Question 5 (difficult - 2 marks) Some of the pathology variables were not related to cluster membership (i.e. the mean levels were very similar across clusters). Choose the two pathology tests which were the least related to cluster membership and run your chosen cluster model again without them (just for your chosen cluster model, i.e.,, k=2 or 3, not for all 6 possible models). Save this new model in an object called gluskB, Inspect the mean levels of the pathology variables of this new model (like Q.3) and compare with death (like Q.4). Has removing these variables changed our results? What does this tell you about including variables not related to the clusters in our analysis? IMPORTANT NOTE: The cluster numbers may change in your new model (i.e,, 1 might become 2 or so on, because the clusters are just arbitrarily labelled with a number, but you will know which cluster is which from the relationships with pathology tests and death) net.youngs-agraphlserMat,young, labels=substr(colnames(young),1,5), | Joint :II : Chill\\I Hypox layout="spring") Question 8 (medium - 1 mark) Compare the two networks you plotted in the above question. Do you note any important differences? If you do think there are differences, list just the two biggest that you notice. Question 9 (easy - 2 marks) Generate the plots of node strength and expected influence for both networks and include them along with the code used to generate them in the box below. 11 Part Two: Network analysis In this section we will mostly use the Synthea conditions data. Start by using the code below to clear the R global environment of any data you still have from the last section, load the conditions data, make a data object with a smaller number of variables, remove patients who did not have COVID 19, truncate the variable names, and load the package ggraph which we use for network analysis. NOTE: if you have closed R since the last activity you will need to set your working directory again. rm(list=1s()) ohs. data=50) , ] old

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!