Question 1 Supervised learning concepts [25 marks] (a) Two subtypes of supervised learning are classification and regression.
Question:
Question 1 Supervised learning concepts [25 marks]
(a) Two subtypes of supervised learning are classification and regression.
(i) Consider the task of predicting the number of children in a family from family
incomes and expenses. Is the task classification or regression?
(ii) Consider the task of predicting the family size no children, one child, two
children, three or more children from family incomes and expenses. Is the
task classification or regression?
(iii) Describe a classification task in the education domain, and briefly justify why
it is classification.
(iv) Describe a regression task in the education domain, and briefly justify why it
is regression.
(b) Four models are trained and tested for a classification task with 3 classes, and their accuracy scores are shown below.
Model Accuracy in training Accuracy in test
Model A 32.9% 33.6%
Model B 50.7% 49.4%
Model C 76.2% 74.3%
Model D 72.5% 46.8%
(i) Which of the models has the most severe underfitting problem?
(ii) Which of the models has the most severe overfitting problem?
(iii) Assume the population ratio of the 3 classes is 2:1:1, in both the training set
and test set. One of the models always predicts the majority class. Which of
the models is it?
(iv) One of the models always predicts a random class among the 3 classes.
Which of the models is it?
(c) Regularization is an important technique in machine learning.
(i) Briefly describe the aim of regularization and how it works.
(ii) Two multiple linear regression models, y1 and y2, are given below. State
which of them is more complex in general.
y1 = 3x3 2x2 + 13
y2 = 5x3 5x2 + 12x
(iii) Name the three types of regularization.
(d) A spam detector is applied to a dataset of e-mail messages, and the results are shown below. Spam messages are the positive class.
Actual spam Actual non-spam
Predicted spam 241 42
Predicted non-spam 8 603
Calculate the values of the following, and give your answers in 4 decimal places for floating-point numbers.
(i) True positive (TP)
(ii) True negative (TN)
(iii) False positive (FP)
(iv) False negative (FN)
(v) Accuracy
(vi) Precision
(vii) Recall
(viii) F1 score
Basic Marketing Research
ISBN: 978-1133188544
8th edition
Authors: Tom J. Brown, Tracy A. Suter, Gilbert A. Churchill