Question: Part 5: Creating Training, Validation, and Test Sets In this section, we will encode our categorical variables and will create training, validation, and test sets.

Part 5: Creating Training, Validation, and Test Sets In this section, we will encode our categorical variables and will create training, validation, and test sets. Create a markdown cell that displays a level 2 header that reads: "Part 5: Creating Training, Validation, and Test Sets". Also add some text briefly describing the purpose of your code in this part. Explain that we will start by separating the categorical features, the numerical features, and the labels. Before moving on to the next step, note that we will be using Cover_Type as the label variable in our models. All other columns will be used as features. Of the feature columns, Wilderness_Area and Soil_Type are categorical, while all other feature columns are numerical. Perform the following steps in a single code cell: Create a 2D array named X_num by selecting the columns of fc that represent numerical features. Create a 2D array named X_cat by selecting the columns of fc that represent categorical features. Create a 1D array named y by selecting the column of fc corresponding to the labels. Print the shapes of all three of these arrays with messages as shown below. Add spacing to ensure that the shape tuples are left-aligned. Numerical Feature Array Shape: xxxx Categorical Feature Array Shape: xxxx Label Array Shape: xxxx Note: The variables created here should be arrays, and not DataFrames or Series. You will need to use .values. Create a markdown cell explaining that we will now be encoding the categorical variables using one-hot encoding. Perform the following steps in a single code cell: 1. Create a OneHotEncoder() object setting sparse=False. 2. Fit the encoder to the categorical features. 3. Use the encoder to encode the categorical features, storing the result in a variable named X_enc. 4. Print the shape of X_enc with a message as shown below. Encoded Feature Array Shape: xxxx Create a markdown cell explaining that we will now combine the numerical features with the encoded features. Perform the following steps in a single code cell: 1. Use np.hstack to combine X_num and X_enc into a single array named X. 2. Print the shape of X with a message as shown below. Feature Array Shape: xxxx Create a markdown cell explaining that we will now split the data into training, validation, and test sets, using a 70/15/15 split. Perform the following steps in a single code cell: Use train_test_split() to split the data into training and holdout sets using an 70/30 split. Name the resulting arrays X_train, X_hold, y_train, and y_hold. Set random_state=1. Use stratified sampling. Use train_test_split() to split the holdout data into validation and test sets using a 50/50 split. Name the resulting arrays X_valid, X_test, y_valid, and y_test. Set random_state=1. Use stratified sampling. Print the shapes of X_train, X_valid, and X_test with messages as shown below. Add spacing to ensure that the shape tuples are left-aligned. Training Features Shape: xxxx Validation Features Shape: xxxx Test Features Shape: xxxx

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!