Question: Complete the missing parts indicated by # Implement me We expect you to follow a reasonable programming style. While we do not mandate a specific

Complete the missing parts indicated by # Implement me

We expect you to follow a reasonable programming style. While we do not mandate a specific style, we require that your code to be neat, clear, documented/commented and above all consistent. Marks will be deducted if these are not followed.

Some conversion between you and your engineer friend...

FriendFriend: Thanks again for solving the XOR problem only using perceptrons! You are really awesome!

YouYou: I know.

FriendFriend: ... Have you ever worked on Iris?

YouYou: Who hasn't?

FriendFriend: ... Can you explain string theory?

YouYou: Sure. Basically it is a unified theory that aims to explain every natural phenomenon. It claims that there are more than three dimensions in reality, most of which are too small to be observed. Actually, the math works pretty well when there are 10 dimensions. Now let us look at the equations...

FriendFriend: Stop! Let us go back to Iris.

YouYou: Sure.

FriendFriend: ... I heard that missing values can cause a lot of troubles.

YouYou: Not necessarily. You can still do things with them.

FriendFriend: Oh really? What if 90% of the class labels are missing? Can you predict them?

YouYou: No.

FriendFriend: Well, you know what? Don't feel bad about yourself. It is ok. You don't have to know everything.

YouYou: What I am saying is, I do not even need 10% labels. Remove all the labels if you want and leave only three unique ones. I can give you over 70% predict accuracy.

FriendFriend: Only three labels? Are you serious?

YouYou: Positive. The only problem is, I am a data scientist. I do not create missing values on purpose. You do that. Then I will go from there.

Step 1: Your friend diligently removed labels

In the end, y_train contains only three unique meaningful labels, the removed labels are denoted by '-1'

import pandas as pd

import numpy as np

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

def your_friends_diligent_work():

"""

:return: X_train, y_train, X_test, y_test

"""

# Read Iris

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',

header=None,

names=['sepal length', 'sepal width', 'petal length', 'petal width', 'target'])

# Get the features and target

X, y = df[['sepal length', 'sepal width', 'petal length', 'petal width']], df['target']

# Encode the target

le = LabelEncoder()

y = le.fit_transform(y)

# Divide the data into training and testing data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0, stratify=y)

# Standardize the features

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Get the index of rows containing the three unique labels

yu, idxs = np.unique(y_train, return_index=True)

# Remove labels

for i in range(len(y_train)):

if i not in idxs:

y_train[i] = -1

return [X_train, X_test, y_train, y_test]

Step 2: Your first effort

You did not use any other packages

The prediction accuracy on y_test is over 75%

from sklearn.metrics import precision_recall_fscore_support

from sklearn.cluster import KMeans

# Get the training and testing data

# y_train contains only three unique meaningful labels, the removed labels are denoted by '-1'

X_train, X_test, y_train, y_test = your_friends_diligent_work()

# The KMeans classifier

km = KMeans(n_clusters=3, random_state=0)

# Implement me

print(precision_recall_fscore_support(y_test_pred, y_test, average='micro'))

output:

(0.75555555555555554, 0.75555555555555554, 0.75555555555555554, None)

Step 3: You went an extra mile

You did not use any other packages

You did use some results in Step 2

You tried something mentioned in Amir's talk in ML I

The prediction accuracy on y_test is improved

from sklearn.neural_network import MLPClassifier

# The MLP classifier

mlp = MLPClassifier(random_state=0)

# Implement me

print(precision_recall_fscore_support(y_test_pred, y_test, average='micro'))

output:

(0.77777777777777779, 0.77777777777777779, 0.77777777777777779, None)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!