Question: 4 Under - Parameterization and Over - Parameterization In the previous section, we had more data points than features in our data, i . e

4

Under

-

Parameterization and Over

-

Parameterization

In the previous section, we had more data points than features in our data, i

.

.,

we were looking at

N > 100 .

This tends to be the ideal situation, since we need to find an unknown weight for each feature, and this gives us enough information to determine each weight

(

similar to how two data points are enough to find the slope and intercept, the two unknowns, of a line

) .

Sometimes, however, we may have fewer data points than we have features

-

this makes it difficult to determines how the underlying model should depend on each feature. We just don't have enough data. In the following problems, consider a training data set of size

N = 50

and a test data set of size

N = 50 .

Problem

8

: Let

A

be a matrix of random values, with

k

rows and

101

columns, where each entry sampled from a

N (0, 1)

distribution. Note that for any input vector

x_{,} A x_{?}

will be a vector of

k

values. We could then consider performing linear regression on the data points

(A x_{,} y)

rather than

(x_{,} y) .

Note that if

k 50,

this transformed data set will have fewer input features than we have data points in our data set, and thus we restore linear regression to working order.

Plot over

k

from

1

50

the testing error when, for a given

k,

you pick a random

A

to transform the input vectors by

,

then do linear regression on the result. You'll need to repeat the experiment for a number of

A,

for each

k,

to get a good plot. What do you notice? Does this seem to be a reasonable trend?

Problem

9

: Notice that there's nothing stopping us from continuing to increase

k .

This puts us in a region over over

-

parameterization

(

we have more features in our data than data points

),

and in fact increasingly over

-

parameterization, if we were bold enough to take

k > 100 .

One possible solution is to

,

when performing linear regression on the transformed

A x_{?}

data, do ridge regression, introducing the ridge penalty into the loss we are minimizing.

Continue the experiment, for

k = 50, 51, 52,

dots,

200,

plotting the resulting testing error

(

averaged over multiple choices of

A) .

How did you choose a good

value?

(

Note that the number of weights we need to find changes with

k -

should this influence

?)

What do you notice?

Bonus: Why does this happen?

4 Under-Parameterization and Over-Parameterization In the previous section, we had more

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

10.2 Fitting a Linear Model To Data. Answer All of the questions and SHOW YOUR WORK! Explore 1: Plotting and Analyzing Residuals questions A, B, C, D, Reflect 1 and 2. SHOW YOUR WORK! Explore 2:...

Describing Data Once we have collected data from surveys or experiments, we need to summarize and present the data in a way that will be meaningful to the reader. We will begin with graphical...

Hello, This is the Fina 210 Course about Real estate. Could you help me to solve this problem and show me a solution? because i want to know how to solve this type of question in exam. Thank you! ps:...

CHAPTER 9 Hypothesis Tests CONTENTS 9.4 POPULATION MEAN: UNKNOWN One-Tailed Test Two-Tailed Test Summary and Practical Advice 9.5 POPULATION PROPORTION Summary 9.6 HYPOTHESIS TESTING AND DECISION...

nodes, but at least its bias can be quantified by Markov Chain L. INTRODUCTION analysis and thus can be corrected via appropriate re-weighting The popularity of online social networks (OSNs) in...

Jones & Bartlett Learning, LLC. NOT FOR RESALE OR DISTRIBUTION CHAPTER Hot Spot Analysis 10 LEARNING OBJECTIVES C A R R Provide a working definition of a \"hot spot.\" , Be able to explain different...

1 For this task, imagine that you were asked to present to a class of master's level students who are enrolled in their first quantitative research methods course. Create a PowerPoint presentation...

SEE "DOCUMENT 1" FOR CRITERIA 2 "DOCUMENT 2" FOR CRITERIA 4 You are to write a 3 to 4 page paper following APA rules for the title page, citations and appropriate references within the body of the...

(b) Perform simple exponential smoothing (using Excel's Data Analysis or other software such as Minitab) using a = .05, 10, .20, and .50. The degree of smoothing varies dramatically as a is...

Bob Evans Farms, Inc. operates 579 restaurants in 18 states and produces fresh and fully cooked sausage products, fresh salads, and related products distributed to grocery stores in the Midwest,...

Which of the following results from decentralization? It results in enhanced competition as segnents are protected from dealing with maket forces. All of these. It results in better implementation of...

you invest $52,400 at 6% compound annually for 5 years. What is your total return on this investment

(Appendices) Select a commonly used product, such as gasoline or milk, and track the prices of this product over a 20-year period by creating a line graph. Explain price increases and decreases based...

(Appendices) Search magazines, newspapers, and the Internet for advertisements. Collect or print out ads that offer the following: (a) something for nothing, (b) bonus for early reply, (c) offers of...

(Appendices) Brandon, age 32, is married and has a son, age one. Six months ago, Brandon purchased an individual health insurance policy covering the entire family. His son was recently diagnosed...