Question: Question 1 Supervised learning concepts [25 marks] (a) Two subtypes of supervised learning are classification and regression. (i) Consider the task of predicting the number

Question 1 Supervised learning concepts [25 marks]

(a) Two subtypes of supervised learning are classification and regression.

(i) Consider the task of predicting the number of children in a family from family

incomes and expenses. Is the task classification or regression?

(ii) Consider the task of predicting the family size no children, one child, two

children, three or more children from family incomes and expenses. Is the

task classification or regression?

(iii) Describe a classification task in the education domain, and briefly justify why

it is classification.

(iv) Describe a regression task in the education domain, and briefly justify why it

is regression.

(b) Four models are trained and tested for a classification task with 3 classes, and their accuracy scores are shown below.

Model Accuracy in training Accuracy in test

Model A 32.9% 33.6%

Model B 50.7% 49.4%

Model C 76.2% 74.3%

Model D 72.5% 46.8%

(i) Which of the models has the most severe underfitting problem?

(ii) Which of the models has the most severe overfitting problem?

(iii) Assume the population ratio of the 3 classes is 2:1:1, in both the training set

and test set. One of the models always predicts the majority class. Which of

the models is it?

(iv) One of the models always predicts a random class among the 3 classes.

Which of the models is it?

(c) Regularization is an important technique in machine learning.

(i) Briefly describe the aim of regularization and how it works.

(ii) Two multiple linear regression models, y1 and y2, are given below. State

which of them is more complex in general.

y1 = 3x3 2x2 + 13

y2 = 5x3 5x2 + 12x

(iii) Name the three types of regularization.

(d) A spam detector is applied to a dataset of e-mail messages, and the results are shown below. Spam messages are the positive class.

Actual spam Actual non-spam

Predicted spam 241 42

Predicted non-spam 8 603

Calculate the values of the following, and give your answers in 4 decimal places for floating-point numbers.

(i) True positive (TP)

(ii) True negative (TN)

(iii) False positive (FP)

(iv) False negative (FN)

(v) Accuracy

(vi) Precision

(vii) Recall

(viii) F1 score

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

a i Predicting the number of children in a family based on family incomes and expenses is regression This is because we are predicting a continuous output the number of children can be any real number ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!