Question: Make use of the scikit-learn (sklearn) python package in your function implementations Complete train_test_split function Using the train_test_split function from sklearn implement a function that

Make use of the scikit-learn (sklearn) python package in your function implementations

Complete train_test_split function

Using the train_test_split function from sklearn implement a function that given a dataset, target column, test size, random state and True/False Value for stratify will return train_features (DataFrame), test_features (DataFrame), train_targets (Series) and test_targets (Series)

Hint: write your code in a way that handles a case where we want to stratify vs where we dont want to stratify (dont use stratify directly as an input to the sklearn function)

feature_engineering_train

Given training features (DataFrame) and a with the feature engineering functions passed in a dict with the format {'feature_name':function,} for each feature_name in the dict, create columns of name in the training DataFrame by passing the training feature dataframe to the associated function. The Returned Dataframe will consist of the input dataframe with the additional feature engineered columns from the dict (NOTE: make sure this new df uses the row indexes corresponding to the input dataframe)

feature_engineering_test

Given test features (DataFrame) and a with the feature engineering functions passed in a dict with the format {'feature_name':function,} for each feature_name in the dict, create columns of name in the test DataFrame by passing the test feature dataframe to the associated function. The Returned Dataframe will consist of the input dataframe with the additional feature engineered columns from the dict (NOTE: make sure this new df uses the row indexes corresponding to the input dataframe)

preprocess

Given a Training Features (DataFrame), Test Features (DataFrame) and the functions you created above, return Training and Test Dataframes with the one_hot_encode_cols encoded, min_max_scale_cols scaled, features described in the feature_engineering_functions engineered and any columns not affected by the above functions passed through to the output the same as they were in the input. (NOTE: make sure this new df uses the row indexes corresponding to the input dataframe)

import numpy as np import pandas as pd import sklearn.preprocessing import sklearn.decomposition import sklearn.model_selection

def train_test_split( dataset: pd.DataFrame, target_col: str, test_size: float, stratify: bool, random_state: int) -> tuple[pd.DataFrame,pd.DataFrame,pd.Series,pd.Series]: # TODO: Write the necessary code to split a dataframe into a Train and Test feature dataframe and a Train and Test # target series train_features = pd.DataFrame() test_features = pd.DataFrame() train_targets = pd.DataFrame() test_targets = pd.DataFrame() return train_features,test_features,train_targets,test_targets

class PreprocessDataset: def __init__(self, train_features:pd.DataFrame, test_features:pd.DataFrame, one_hot_encode_cols:list[str], min_max_scale_cols:list[str], n_components:int, feature_engineering_functions:dict ): # TODO: Add any state variables you may need to make your functions work return

def feature_engineering_train(self) -> pd.DataFrame: # TODO: Write the necessary code to create a dataframe with feature engineering functions applied # from the feature_engineering_functions dict (the dict format is {'feature_name':function,}) # each feature engineering function will take in type pd.DataFrame and return a pd.Series feature_engineered_dataset = pd.DataFrame() return feature_engineered_dataset

def feature_engineering_test(self) -> pd.DataFrame: # TODO: Write the necessary code to create a dataframe with feature engineering functions applied # from the feature_engineering_functions dict (the dict format is {'feature_name':function,}) # each feature engineering function will take in type pd.DataFrame and return a pd.Series feature_engineered_dataset = pd.DataFrame() return feature_engineered_dataset

def preprocess(self) -> tuple[pd.DataFrame,pd.DataFrame]: # TODO: Use the functions you wrote above to create train/test splits of the features and target with scaled and encoded values # for the columns specified in the init function train_features = pd.DataFrame() test_features = pd.DataFrame() return train_features,test_features

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Make use of the scikit-learn (sklearn) python package in your function implementations Complete train_test_split function Using the train_test_split function from sklearn implement a function that...

P1 Make use of the scikit-learn (sklearn) python package in your function implementations Complete train_test_split function Using the train_test_split function from sklearn implement a function that...

Make use of the scikit-learn (sklearn) python package in your function implementations Complete train_test_split function Using the train_test_split function from sklearn implement a function that...

Can you also explain how to call P1 from P2 and use the functions created in P1 in P2. P1 Make use of the scikit-learn (sklearn) python package in your function implementations Complete the Following...

Make use of the scikit-learn (sklearn) python package in your function implementations Complete the Following Functions in task4.py: calculate_naive_metrics Given a train dataframe, test dataframe,...

Can you please explain how to use the __init__ function here? I'm confused on how to call it and use the variables in it. Make use of the scikit-learn (sklearn) and yellowbrick python packages in...

Make use of the scikit-learn (sklearn) and yellowbrick python packages in your function implementations Complete the KmeansClustering class in task3.pykmeans_train Initialize a sklearn Kmeans model...

(Classification of Acquisition Costs) Selected accounts included in the property, plant, and equipment section of Lobo Corporations balance sheet at December 31, 2011, had the following balances....

Table P12.14 shows data obtained from oxidation of pesticides present in wastewater by a mixed culture of microorganisms in a continuously operating aeration tank. Table P12.14 D, h -l Pesticides S,...

The return an investor can expect on a tax-advantaged investment depends upon its structure and type. A tax-advantaged investment could NOT be structured to yield: a greater tax benefit with reduced...

Questions Q1. Write a Python program to retrieve the first and last colors from the following list: color_list = ["red", "green", "white", "blue", "black") Q2. Given the following dictionary,...

Price rationing helps remove patients who do not really need treatment from queue lines.

What would motivate the decision maker to approve your idea?

What channels of communication or media would work best to propose the idea? Why?