Question: In practice, PXY is usually unknown and we use the empirical risk minimizer (ERM). We will reformulate the problem as a d-dimensional linear regression problem.

 In practice, PXY is usually unknown and we use the empiricalrisk minimizer (ERM). We will reformulate the problem as a d-dimensional linear

In practice, PXY is usually unknown and we use the empirical risk minimizer (ERM). We will reformulate the problem as a d-dimensional linear regression problem. First note that functions in Hd are parametrized by a vector b=[b0,b1,bd], we will use the notation fb. Similarly we will note aR3 the vector parametrizing g(x)=fa(x). We will also gather data points from the training sample in the following matrix and vector: X=111x1x2xNx1dx2dxN,y=[y0,y1,yN] These notations allow us to take advantage of the very effective linear algebra formalism. X is called the design matrix. 5. (2 Points) Show that the empirical risk minimizer (ERM) b^ is given by the following minimization b^=bargminXby22. 6. (3 Points) If N>d and X is full rank, show that b^=(XX)1Xy. (Hint: you should take the gradients of the loss above with respect to b1). Why do we need to use the conditions N>d and X full rank

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!