Question: Part 2 : Splitting the Data When training a machine learning model, it is common practice to split the dataset into two separate sets, referred
Part : Splitting the Data
When training a machine learning model, it is common practice to split the dataset into two separate sets, referred to as
the training set and the test set. The model is created using the data from the training set. After the model is created, its
performance will be evaluated on the test set. The reason for splitting the data in this way is that machine learning models
tend to perform better on the data sets on which they were trained than they do when exposed to new data. If we trained
the model on a data set, and then evaluated it on the same data, we would likely have an overly optimistic view of how well
the model would perform on new observations. By splitting the data, we can hold out a set of observations that are not
seen during training. The heldout test data can be used to give as a less biased assessment of how well the model will
perform on new data
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
