The question is about the cross validation problem. This is the questions: (4) Cross-Validation. The above steps
Fantastic news! We've Found the answer you've been seeking!
Question:
The question is about the cross validation problem.
This is the questions:
Transcribed Image Text:
(4) Cross-Validation. The above steps are sufficient in many machine learning or data mining questions when both training and testing data sets are very large. However, for relatively small data sets, one may want to do further to assess the robustness of each approach. One general approach is Monte Carlo Cross Validation algorithm that splits the observed data points into training and testing subsets, and repeats the above computation B times (B = 100 say). In the context of this homework, we can combine n 1220 training data and n2 330 testing data together into a larger data set with the size of n = 1550. Then for this given larger dataset with size n = 1550 and for each loop b = 1, , B, we randomly select n 1220 as a new training subset and use the remaining n2 = 330 data as the new testing subset. Within each loop, we first build different models from "the training data of that specific loop" and then evaluate their performances on "the corresponding testing data." Therefore, for each model or method in part (3), we will obtain B values of the testing errors on B different subsets of testing data, denote by TE, for b = 1, 2,..., B. Then the "average" performances of each model can be summarized by the sample mean and sample variances of these B values: = = TE* - B 1 12 B b=1 TEb B and Var(TE) 1-2 (TE-TE-) = B = b=1 2 - Compute and compare the average" performances of each model or method mentioned in part (2). In particular, based on your results, write some paragraphs to provide a brief summary of what you discover in the cross-validation, including reporting the optimal choice of the tuning parameter k in the KNN method, and explaining how confident you are on the usefulness of your optimal choice in real-world applications. (4) Cross-Validation. The above steps are sufficient in many machine learning or data mining questions when both training and testing data sets are very large. However, for relatively small data sets, one may want to do further to assess the robustness of each approach. One general approach is Monte Carlo Cross Validation algorithm that splits the observed data points into training and testing subsets, and repeats the above computation B times (B = 100 say). In the context of this homework, we can combine n 1220 training data and n2 330 testing data together into a larger data set with the size of n = 1550. Then for this given larger dataset with size n = 1550 and for each loop b = 1, , B, we randomly select n 1220 as a new training subset and use the remaining n2 = 330 data as the new testing subset. Within each loop, we first build different models from "the training data of that specific loop" and then evaluate their performances on "the corresponding testing data." Therefore, for each model or method in part (3), we will obtain B values of the testing errors on B different subsets of testing data, denote by TE, for b = 1, 2,..., B. Then the "average" performances of each model can be summarized by the sample mean and sample variances of these B values: = = TE* - B 1 12 B b=1 TEb B and Var(TE) 1-2 (TE-TE-) = B = b=1 2 - Compute and compare the average" performances of each model or method mentioned in part (2). In particular, based on your results, write some paragraphs to provide a brief summary of what you discover in the cross-validation, including reporting the optimal choice of the tuning parameter k in the KNN method, and explaining how confident you are on the usefulness of your optimal choice in real-world applications.
Expert Answer:
Related Book For
International Marketing And Export Management
ISBN: 9781292016924
8th Edition
Authors: Gerald Albaum , Alexander Josiassen , Edwin Duerr
Posted Date:
Students also viewed these algorithms questions
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...
-
Figure shows a current loop ABCDEFA carrying a current i = 5.00 A. The sides of the loop are parallel to the coordinate axes shown, with AB = 20.0 cm, BC = 30.0 cm, and FA = 10.0 cm. In unit vector...
-
An electron and a proton have the same kinetic energy and are moving at speeds much less than the speed of light. Concepts: (i) How is the de Broglie wavelength l related to the magnitude p of the...
-
If each systolic reading is exactly twice the diastolic reading, what is the value of the linear correlation coefficient r?
-
In 2011, quarterbacks Tom Brady, Drew Brees, and Payton Manning, representing a class including current NFL players, claimed that a league-imposed lockout was a horizontal group boycott among...
-
On January 1, 2013, Monica Company acquired 70 percent of Young Companys outstanding common stock for $665,000. The fair value of the noncontrolling interest at the acquisition date was $285,000....
-
1. Identify products A and B 2. 3. CH3 dil. KMnO4 CrO3 A- B 273 K CH3 CH3 (A) A: OH B: OH OH CH3 CH3 (B) A: OH OH B: (C) A: OHCCH,CH,CH, CCH, B: HOOCCH,CH,CH, CH3 CH3 (D) A: B: OH -CH3 The electrode...
-
b Write a mechanism for the step shown below, using curved arrows to show electron redistribution. Arrow-pushing Instructions CH;CH OCH,CH3 > H H OCH,CH,
-
A1) Question(s) Question (1): Construct a quantitative model and apply to address a particular business problem! You may select any of the models addressed in the course syllabus depending on the...
-
In 1998, Wakefield et al. reported a link between the administration of the MMR vaccine and a form of autism, igniting panic and contributing to a public health issue that continues today....
-
Determine net income for the period if beginning stockholders' equity is $19,000, dividends declared amount to $7,000, ending stockholders' equity is $37,000, and the corporation issued $1,000 of...
-
An apron with a 0 . 5 mm lead equivalent thickness typically attenuates scattered 1 0 0 kVp x - rays by what amount? ( HVL of lead is 0 . 1 2 mm ) A . 5 0 % B . 7 5 % C . 9 0 % D . 9 5 % E . 1 0...
-
1) Describe your company's stock repurchase policy over the last 3 years . 2) Describe the stock repurchase policy of your competitor(s) over the last 3 years . 3) Compare the stock repurchase policy...
-
Test a claim that the mean amount of carbon monoxide in the air in U.S. cities is less than 2.33 parts per million. It was found that the mean amount of carbon monoxide in the air for the random...
-
14. In testing the existence assertion, an auditor ordinarily works from the a. Financial statements to the accounting records. b. General journal to the general ledger. c. Supporting evidence to the...
-
The Supreme Canning Company (the true name of the company is disguised) is an independent US packer of tomato paste and other tomato products (whole peeled tomatoes, chopped tomatoes, tomatoes and...
-
Explain the meaning of the following statement: Managing multiculturalism within the international marketing organization and within the markets it serves is what makes international marketing...
-
The Woberg Company, located near Aarhus, Denmark, manufactured dishwashers and garbage pulverizers for home use, and cooking equipment and commercial pulverizers for restaurants. A relatively young...
-
True or False: Engineers seldom have an opportunity to influence the recovery period for expenditures.
-
A lumber company purchases and installs a wood chipper for \(\$ 200,000\). The chipper is classified as MACRS 7-year property. Its useful life is 10 years. The estimated salvage value at the end of...
-
True or False: Straight-line depreciation is the most popular depreciation method used in financial reporting.
Study smarter with the SolutionInn App