Question: PLEASE HELP ME WITH THIS WHOLE PYTHON PROGRAMMING PROJECT Activity 1: Create Dummy Dataset In this activity, you have to execute the code cell which

PLEASE HELP ME WITH THIS WHOLE PYTHON PROGRAMMING PROJECT

Activity 1: Create Dummy Dataset

In this activity, you have to execute the code cell which creates a dummy dataset for multiclass classification using make_blobs() function of the sklearn.datasets module.

Syntax: make_blobs(n_samples, centers, n_features, random_state, cluster_std)

[ ]

 
 
# Run this code cell to generate dummy data using 'make_blobs()' function from sklearn.datasets import make_blobs import pandas as pd features_array, target_array = make_blobs(n_samples = [200, 500, 700, 272, 333], n_features = 2, random_state = 42, centers=None,cluster_std=1) # Creating Pandas DataFrame containing the items from the 'features_array' and 'target_array' arrays. # A dummy dictionary dummy_dict = {'col 1': [features_array[i][0] for i in range(features_array.shape[0])], 'col 2': [features_array[i][1] for i in range(features_array.shape[0])], 'target': target_array} # Converting the dictionary into DataFrame dummy_df = pd.DataFrame.from_dict(dummy_dict) # Printing first five rows of the dummy DataFrame dummy_df.head() 

In the above code cell,

A dummy dataset is created having two columns representing two independent variables and a third column representing the target.

The number of records are divided into 5 random groups like [200, 500, 700, 272, 333] such that the target columns has 5 different labels [0, 1, 2, 3, 4].

A dummy DataFrame is created from the two arrays using a Python dictionary. (Learnt in "Logistic Regression - Decision Boundary" lesson)

After this activity, the DataFrame should be created with two independent features columns and one dependent target column.

Activity 2: Dataset Inspection

In this activity, you have look into the distribution of the labels in the target column of the DataFrame.

1. Print the number of occurences of each label in target column.

[ ]

 
 
# Display the number of occurrences of each label in the 'target' column. 

2. Print the percentage of the samples for each label in target column.

[ ]

 
 
# Get the percentage of count of each label samples in the dataset. 

Q: How many unique labels are present in the DataFrame? What are they?

A:

After this activity, the labels to be predicted i.e the target variables and their distribution should be known.

Activity 3: Train-Test Split

We need to predict the value of the target variable, using other variables. Thus, target is the dependent variable and other columns are the independent variables.

1. Split the dataset into the training set and test set such that the training set contains 70% of the instances and the remaining instances will become the test set.

2. Set random_state = 42.

[ ]

 
 
# Import 'train_test_split' module # Create the features data frame holding all the columns except the last column # and print first five rows of this dataframe # Create the target series that holds last column 'target' # and print first five rows of this series # Split the train and test sets using the 'train_test_split()' function. 

3. Print the number of rows and columns in the training and testing set.

[ ]

 
 
# Print the shape of all the four variables i.e. 'X_train', 'X_test', 'y_train' and 'y_test' 

After this activity, the features and target data should be splitted into training and testing data.

Activity 4: Model Training and Prediction

Implement SVM classification using sklearn module in the following way:

1. Deploy the model by importing the SVC class.

2. Create an object of the SVC class and pass kernel = "linear" as input to its constructor.

3. Call the fit() function of the SVC class on the object created and pass X_train and y_train as inputs to the function.

4. Call the score() function with X_train and y_train as inputs to check the accuracy score of the model.

[ ]

 
 
# Build a logistic regression model using the 'sklearn' module. # 1. Create the SVC model and pass 'kernel=linear' as input. # 2. Call the 'fit()' function with 'X_train' and 'y_train' as inputs. # 3. Call the 'score()' function with 'X_train' and 'y_train' as inputs to check the accuracy score of the model. 

5. Make the predictions on the train set using predict() function.

[ ]

 
 
# Make predictions on the train dataset by using the 'predict()' function. # Print the occurrence of each label computed in the predictions. 

Q: Does the model classify all the labels in the training set?

A:

6. Make predictions on the test dataset by using the predict() function.

[ ]

 
 
# Make predictions on the test dataset. # Print the occurrence of each label computed in the predictions. 

Q: Does the model classify all the labels in the test set?

A:

After this activity, an SVM model should be trained and values of the labels should be predicted for the target columns for multiclass classification.

Activity 5: Model Evaluation

1. Create a confusion matrix to calculate True Positives, False Positives, True Negatives and False Negatives for the test set to evaluate the SVC linear model.

[ ]

 
 
# Create a confusion matrix for the test set. # Import the libraries # Print the confusion matrix 

Q: Does the confusion matrix indicate any misclassification?

A:

2. Print the classification report to observe the recall, precision and f1-scores for linear SVC model.

[ ]

 
 
# Print the classification report for the actual and predicted data of the testing set 

Q: What are the f1-scores for all the labels?

A:

After this activity, the model should be evaluated for the target columns using the test features set.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!