Question: ( a ) Define the following concepts clearly, within the context of a binary classification problem: ( i ) Hypothesis Space ( ii ) Consistency

(a) Define the following concepts clearly, within the context of a binary classification problem:
(i) Hypothesis Space
(ii) Consistency and satisfying
(iii) version space
[2+5+2 marks]
(b) Consider the problem of assigning the label "family car" or "not family car" to cars. For convenience, we shall replace the label "family car" by "1" and "not family car" by "0". Suppose we choose the features "price ('000 $)" and "power (hp)" as the input representation for the problem. Further, suppose that there is some reason to believe that for a car to be a family car, its price and power should be in certain ranges.
(i) Write down the proposition for the problem.
(ii) Using a plot, write down the hypothesis space for the problem.
[3+5 marks ]
(c) Let x be the set of all possible examples for a binary classification problem and let h' and h'' be two hypotheses for the problem. Define the following concepts:
(i)h' is more general than h'',
(ii)h' is more specific than h'',
(iii)h' is strictly more general than h'',
(iv)h' is strictly more specific than h''.
[2+2+2+2 marks ]
Question 2: (25 Marks)
(a) Write down five reasons why dimensionality reduction is useful.
[5 marks]
(b) Compare and contrast the forward selection and backward selction methods for subset selection. Provide the algorithms as well.
[14 marks]
Machine Learning Techniques
STAT 5349 C
Please Turn Over
Page 2 of 4
(c) Given the data in the following table, use PCA to reduce the dimension from 2 to 1 :
\table[[Feature,Example 1,Example 2,Example 3,Example 4],[x1,4,8,13,7],[x2,11,4,5,14]]
[6 marks]
Question 3: (25 Marks)
(a) Clearly distinguish between 52 cross-validation and leave-one-out-cross-validation.
(7 Marks)
(b) Describe the Receiver Operating Characteristic (ROC) space and the noteworthy points therein.
(7 Marks)
(c) Assume the following: A database contains 80 records on a particular topic of which 55 are relevant to a certain investigation. A search was conducted on that topic and 50 records were retrieved. Of the 50 records retrieved, 40 were relevant. Construct the confusion matrix for the search and calculate the precision and recall scores for the search.
(d) Desccribe clearly how you would be using numeric features with the naive Bayes algorithm.
(6 Marks)
 (a) Define the following concepts clearly, within the context of a

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!