Question: An imbalanced data set is one where data with positive labels are way fewer compared to data with negative labels. Suppose we have a fraudulent

An imbalanced data set is one where data with positive labels are way fewer compared to data with negative labels. Suppose we have a fraudulent credit card data set, fraudulent credit card transactions is only 1% 2% of all transactions, but the risk associated with not catching fraudulent activity is very high. 1. (Points: 5) Suppose we use classical machine learning algorithms taught in the class, how should we split the data set into training, validation, and test datasets in this case? Notice that it may not be a good idea to hash all data points into the three datasets uniformly at random. 2. (Points: 5) Suppose we indeed hash data points uniformly at random, how can we change the loss function of the machine learning algorithms to achieve the same effect
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
