Question: 1. Create a Python package csc665. It's just a subdirectory, with an empty fileinit_ _-py in it. 2. Create features.py in the csc665 subdirectory and

1. Create a Python package csc665. It's just a subdirectory, with an empty fileinit_ _-py in it. 2. Create features.py in the csc665 subdirectory and implement the following functions: A def train_test_split(x, y, test_size, shuffle, random state-None): X, y features and the target variable. test_size - between 0 and 1 - how much to allocate to the test set; the rest goes to the train set shuffle - if True, shuffle the dataset, otherwise not. random_state, integer; if None, then results are random, otherwise fixed to a given seed. Example: - X_train, X_test, y_train, y_test train_test_split(feat_df, y, 0.3, True, 12) B. create_categories(df, list_columns) Converts values, in-place, in the columns passed in the list_columns to numerical values. Follow the same approach: "string" -> category -> code. Replace values in df, in-place. y C. x, preprocess-ver-1 (csv df) = Apply the feature transformation steps to the dataframe, return new X and y for entire dataset. Do not modify the original csv_df. - Remove all rows with NA values . Convert datetime to a number Convert all strings to numbers. Split the dataframe into X and y and return these. 3. Create metrics.py A def mse (y_predicted, y true) -return Mean-Squared Error. B. def rmse(y predicted, y true) return Root Mean-Squared Error. C. def rsq(y_predicted, y_true) -return R2. 1. Create a Python package csc665. It's just a subdirectory, with an empty fileinit_ _-py in it. 2. Create features.py in the csc665 subdirectory and implement the following functions: A def train_test_split(x, y, test_size, shuffle, random state-None): X, y features and the target variable. test_size - between 0 and 1 - how much to allocate to the test set; the rest goes to the train set shuffle - if True, shuffle the dataset, otherwise not. random_state, integer; if None, then results are random, otherwise fixed to a given seed. Example: - X_train, X_test, y_train, y_test train_test_split(feat_df, y, 0.3, True, 12) B. create_categories(df, list_columns) Converts values, in-place, in the columns passed in the list_columns to numerical values. Follow the same approach: "string" -> category -> code. Replace values in df, in-place. y C. x, preprocess-ver-1 (csv df) = Apply the feature transformation steps to the dataframe, return new X and y for entire dataset. Do not modify the original csv_df. - Remove all rows with NA values . Convert datetime to a number Convert all strings to numbers. Split the dataframe into X and y and return these. 3. Create metrics.py A def mse (y_predicted, y true) -return Mean-Squared Error. B. def rmse(y predicted, y true) return Root Mean-Squared Error. C. def rsq(y_predicted, y_true) -return R2
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
