Question: Problem 2: Counterfeit Swiss Francs The banknote data in the mclust package contains six features of banknotes and an eighth status column indicating if whether

 Problem 2: Counterfeit Swiss Francs The banknote data in the mclust

Problem 2: Counterfeit Swiss Francs The banknote data in the mclust package contains six features of banknotes and an eighth status column indicating if whether the bill is genuine or counterfeit. We are going to ignore these classification labels and see if we can predict them based on estimated clusters. Please limit yourself to the mclust commands used in lecture, and adhere to the instructions below, when completing this question a. (8 pts) First inspect each of the pairs of the six attributes. Do any of the pairs suggest a simple "bifurcation rule" 2 clusters? For example, would a vertical or horizontal that can be used to initialize an EM-based search for k split through one of the coordinates be sensible? b. (12 pts) With the initialization you decided on in part a., use mstepvVv and estepvvv to iterate through a sequence of EM steps for k = 2 clusters until you judge the method to have converged. Provide a visualization of the resulting clusters via pairwise plots similar to those from part a., augmented with color and ellipses. Also provide a visualization of the progress in log likelihood values over the EM iterations C. (10 pts) How do your EM-steps progress (and log likelihoods change) if you instead initialize the EM algorithm from part b. by regressing Left on Right and use that line to separate the points into two classes? d. (10 pts) Consider adding a third cluster. How would you take the k = 2 results from above and sensibly initialize a third cluster. Using your preferred initialization, iterate through a sequence of EM steps (mstepvvv and estepvvv) for k 3 clusters until you judge the method to have converged. Visualize the results, including plots of pairs and log likelihood progress e. (10 pts) How do the results from your efforts above compare to what you get from the Mclust command? f. (10 pts) Use your preferred clustering from above as the basis for a predictor of the status column. Compared to the true labels, what is your misclassification rate? Problem 2: Counterfeit Swiss Francs The banknote data in the mclust package contains six features of banknotes and an eighth status column indicating if whether the bill is genuine or counterfeit. We are going to ignore these classification labels and see if we can predict them based on estimated clusters. Please limit yourself to the mclust commands used in lecture, and adhere to the instructions below, when completing this question a. (8 pts) First inspect each of the pairs of the six attributes. Do any of the pairs suggest a simple "bifurcation rule" 2 clusters? For example, would a vertical or horizontal that can be used to initialize an EM-based search for k split through one of the coordinates be sensible? b. (12 pts) With the initialization you decided on in part a., use mstepvVv and estepvvv to iterate through a sequence of EM steps for k = 2 clusters until you judge the method to have converged. Provide a visualization of the resulting clusters via pairwise plots similar to those from part a., augmented with color and ellipses. Also provide a visualization of the progress in log likelihood values over the EM iterations C. (10 pts) How do your EM-steps progress (and log likelihoods change) if you instead initialize the EM algorithm from part b. by regressing Left on Right and use that line to separate the points into two classes? d. (10 pts) Consider adding a third cluster. How would you take the k = 2 results from above and sensibly initialize a third cluster. Using your preferred initialization, iterate through a sequence of EM steps (mstepvvv and estepvvv) for k 3 clusters until you judge the method to have converged. Visualize the results, including plots of pairs and log likelihood progress e. (10 pts) How do the results from your efforts above compare to what you get from the Mclust command? f. (10 pts) Use your preferred clustering from above as the basis for a predictor of the status column. Compared to the true labels, what is your misclassification rate

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!