Question: . ( points) The norm is defined as the length or magnitude of a vector. The norms are often used in machine learning for regularization

 . ( points) The norm is defined as the length ormagnitude of a vector. The norms are often used in machine learningfor regularization and feature selection. p-norm can be formally written as below:xp:=(i=1nxip)1/p Following is an example where the L1 norm is used as

. ( points) The norm is defined as the length or magnitude of a vector. The norms are often used in machine learning for regularization and feature selection. p-norm can be formally written as below: xp:=(i=1nxip)1/p Following is an example where the L1 norm is used as a penalty term in the cost function. J=N1i=1N(yiy^i)2+j=1mwj1 In your own words, explain what the effect of the second term (penalty) is in the cost function while training your model. ( points) The bag-of-words model is a representation used in natural language processing, in which a text is represented as the bag (multiset) of its words, disregarding grammar but keeping multiplicity. With this representation, a document can be represented as a vector in a Vector space model. In order to compute the similarity between documents, we can use the Euclidean distance or Cosine similarity between two document representations. Explain how these two methods differ from each other. Euclidean distance: d(p,q)=i=1d(qipi)2 Cosine similarity: sim(p,q)=cos()=pqpq 5. ( points) A density estimator learns a mapping from a set of attributes to a probability. For discrete variables, we can just count the observations in particular to the event, and use it as an estimated probability, such as: P^(xi=u)=totalnumberofrecordsnumberofrecordsinwhichxi=u Page 4 For a binary variable, such as coin flip with P^(X= head )=q, we would like to find qargmaxqn1(1q)n2 where n1,n2 are the frequencies of the classes. Prove that the relative frequency is the best estimate. (hint. derivative of the argmax function above) ( points) Suppose you have the following training dataset with three inputs and a boolean output. You are to predict C using a Naive Bayes classifier. After learning, given the features (x1=1,x2=0,x3=1), predict class using the trained Naive Bayes classifier as below: P^(Ck)i=13P^(xiCk)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!