Question: Complete the missing parts indicated by # Implement me We expect you to follow a reasonable programming style. While we do not mandate a specific
Complete the missing parts indicated by # Implement me
We expect you to follow a reasonable programming style. While we do not mandate a specific style, we require that your code to be neat, clear, documented/commented and above all consistent. Marks will be deducted if these are not followed.
Some conversion between you and your engineer friend...
FriendFriend: Thanks again for solving the XOR problem only using perceptrons! You are really awesome!
YouYou: I know.
FriendFriend: ... Have you ever worked on Iris?
YouYou: Who hasn't?
FriendFriend: ... Can you explain string theory?
YouYou: Sure. Basically it is a unified theory that aims to explain every natural phenomenon. It claims that there are more than three dimensions in reality, most of which are too small to be observed. Actually, the math works pretty well when there are 10 dimensions. Now let us look at the equations...
FriendFriend: Stop! Let us go back to Iris.
YouYou: Sure.
FriendFriend: ... I heard that missing values can cause a lot of troubles.
YouYou: Not necessarily. You can still do things with them.
FriendFriend: Oh really? What if 90% of the class labels are missing? Can you predict them?
YouYou: No.
FriendFriend: Well, you know what? Don't feel bad about yourself. It is ok. You don't have to know everything.
YouYou: What I am saying is, I do not even need 10% labels. Remove all the labels if you want and leave only three unique ones. I can give you over 70% predict accuracy.
FriendFriend: Only three labels? Are you serious?
YouYou: Positive. The only problem is, I am a data scientist. I do not create missing values on purpose. You do that. Then I will go from there.
Step 1: Your friend diligently removed labels
In the end, y_train contains only three unique meaningful labels, the removed labels are denoted by '-1'
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
def your_friends_diligent_work():
"""
:return: X_train, y_train, X_test, y_test
"""
# Read Iris
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
header=None,
names=['sepal length', 'sepal width', 'petal length', 'petal width', 'target'])
# Get the features and target
X, y = df[['sepal length', 'sepal width', 'petal length', 'petal width']], df['target']
# Encode the target
le = LabelEncoder()
y = le.fit_transform(y)
# Divide the data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0, stratify=y)
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Get the index of rows containing the three unique labels
yu, idxs = np.unique(y_train, return_index=True)
# Remove labels
for i in range(len(y_train)):
if i not in idxs:
y_train[i] = -1
return [X_train, X_test, y_train, y_test]
Step 2: Your first effort
You did not use any other packages
The prediction accuracy on y_test is over 75%
from sklearn.metrics import precision_recall_fscore_support
from sklearn.cluster import KMeans
# Get the training and testing data
# y_train contains only three unique meaningful labels, the removed labels are denoted by '-1'
X_train, X_test, y_train, y_test = your_friends_diligent_work()
# The KMeans classifier
km = KMeans(n_clusters=3, random_state=0)
# Implement me
print(precision_recall_fscore_support(y_test_pred, y_test, average='micro'))
output:
(0.75555555555555554, 0.75555555555555554, 0.75555555555555554, None)
Step 3: You went an extra mile
You did not use any other packages
You did use some results in Step 2
You tried something mentioned in Amir's talk in ML I
The prediction accuracy on y_test is improved
from sklearn.neural_network import MLPClassifier
# The MLP classifier
mlp = MLPClassifier(random_state=0)
# Implement me
print(precision_recall_fscore_support(y_test_pred, y_test, average='micro'))
output:
(0.77777777777777779, 0.77777777777777779, 0.77777777777777779, None)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
