Question: Q 1 . import numpy as np import pandas as pd from sklearn.datasets import fetch _ openml from sklearn.preprocessing import StandardScaler from sklearn.impute import KNNImputer

Q1.
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import StandardScaler
from sklearn.impute import KNNImputer
from sklearn.neighbors import LocalOutlierFactor
Load the diabetes dataset
diabetes = fetch_openml('diabetes', version=1, as_frame=True)
diabetes_df = diabetes.frame
Create a descriptive statistics for the dataset
print(diabetes_df.describe())
Normalize the columns 'age' and 'insu'
scaler = StandardScaler()
diabetes_df[columns_to_normalize]= scaler.fit_transform(diabetes_df[columns_to_normalize])
Detect outliers in two columns 'age' and 'insu' using LOF or Isolation forest
diabetes_df['outlier']= lof.fit_predict(diabetes_df[columns_for_outlier_detection])
Filter out rows identified as outliers (rows with outliers will be -1)
filtered_df = diabetes_df[diabetes_df['outlier']!=-1]
Print the length of filtered dataset with and without outliers
print(f"Length of dataset with outliers: {len(diabetes_df)}")
print(f"Length of dataset without outliers: {len(filtered_df)}")
Introduce missing values in the 'mass' and 'insu' columns and print the missing indices in those columns
np.random.seed(42)
diabetes_df['mass_with_missing']= diabetes_df['mass'].copy()
diabetes_df['mass_with_missing'].loc[np.random.choice(diabetes_df.index, size=20)]= np.nan
diabetes_df['insu_with_missing'] diabetes_df['insu'].copy()=

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!