Question: Can you please explain how to use the __init__ function here? I'm confused on how to call it and use the variables in it. Make

Can you please explain how to use the __init__ function here? I'm confused on how to call it and use the variables in it.

Make use of the scikit-learn (sklearn) and yellowbrick python packages in your function implementations

Complete the KmeansClustering class in task3.pykmeans_train

Initialize a sklearn Kmeans model using random_state, n_init =10. Initialize a yellowbrick KElbowVisualizer to search for the optimal value of k (between 1 and 10). Train the KElbowVisualizer on the training data and determine the optimal k value. Then Train a Kmeans model with the proper initialization for that optimal value of k and return the cluster ids for each row of the training set as a list.

kmeans_test

Using the model you trained in the previous function return the cluster ids for each row of the test set as a list.

train_add_kmeans_cluster_id_feature

Using kmeans_train add an additional column to the training features and return the training dataframe with all input features untouched and the additional cluster id column with the column name kmeans_cluster_id

test_add_kmeans_cluster_id_feature

Using kmeans_test add an additional column to the test features and return the test dataframe with all input features untouched and the additional cluster id column with the column name kmeans_cluster_id

import numpy as np import pandas as pd import sklearn.cluster import yellowbrick.cluster

class KmeansClustering: def __init__(self, train_features:pd.DataFrame, test_features:pd.DataFrame, random_state: int ): # TODO: Add any state variables you may need to make your functions work pass

def kmeans_train(self) -> list: # TODO: train a kmeans model using the training data, determine the optimal value of k (between 1 and 10) with n_init set to 10 and return a list of cluster ids # corresponding to the cluster id of each row of the training data cluster_ids = list() return cluster_ids

def kmeans_test(self) -> list: # TODO: return a list of cluster ids corresponding to the cluster id of each row of the test data cluster_ids = list() return cluster_ids

def train_add_kmeans_cluster_id_feature(self) -> pd.DataFrame: # TODO: return the training dataset with a new feature called kmeans_cluster_id output_df = pd.DataFrame() return output_df

def test_add_kmeans_cluster_id_feature(self) -> pd.DataFrame: # TODO: return the test dataset with a new feature called kmeans_cluster_id output_df = pd.DataFrame() return output_df

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!