All answers to problem set questions must be typed so they can be reviewed by Turnitin....
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
All answers to problem set questions must be typed so they can be reviewed by Turnitin. Please submit your completed problem set through Canvas using the instructions found on the syllabus. Problem 1 (2 points) Distance is a key notion underlying many data mining algorithms, such as k-nearest neighbor (k- NN). Why can it be a problem to compare customers using regular Euclidean distance such as when they are described by age (in years), income (in dollars), and number of credit cards? How can this problem be fixed? It's common for income values to have a much wider range compared to age and credit card numbers, causing income to have a stronger influence on predictions. To address this issue, we can normalize the data. This involves selecting a scale factor for each input attribute and adjusting the inputs along each dimension accordingly. The goal is to ensure that the variance is balanced on every axis. By applying this normalization technique, we can mitigate the dominance of income values and improve the accuracy and fairness of predictions. Problem 2 (3 points) You currently work for Aperture Science, a small company that sells information technology (IT) products. The lone data scientist at Aperture approaches you one day and proposes to use k-NN estimation to build a model to predict the IT budget of companies to identify potential new clients. They would like your help building and deploying the model. The only data you have on hand is a sample of companies across the United States, which includes their IT budget for last year, their total revenue last year, their total number of employees last year, and their industry classification. This data will make up your database of potential neighbors. Ultimately, as a first true test of the model you want predict the IT budget for Acme Corp., a potential client for whom you do not know their IT budget (but you know their total revenue, number of employees, and industry classification). A) Given the information above, explain how you could estimate Acme's IT budget using k- NN. B) If you chose X-N, the total number of training examples, what would be the effect? All answers to problem set questions must be typed so they can be reviewed by Turnitin. Please submit your completed problem set through Canvas using the instructions found on the syllabus. Problem 1 (2 points) Distance is a key notion underlying many data mining algorithms, such as k-nearest neighbor (k- NN). Why can it be a problem to compare customers using regular Euclidean distance such as when they are described by age (in years), income (in dollars), and number of credit cards? How can this problem be fixed? It's common for income values to have a much wider range compared to age and credit card numbers, causing income to have a stronger influence on predictions. To address this issue, we can normalize the data. This involves selecting a scale factor for each input attribute and adjusting the inputs along each dimension accordingly. The goal is to ensure that the variance is balanced on every axis. By applying this normalization technique, we can mitigate the dominance of income values and improve the accuracy and fairness of predictions. Problem 2 (3 points) You currently work for Aperture Science, a small company that sells information technology (IT) products. The lone data scientist at Aperture approaches you one day and proposes to use k-NN estimation to build a model to predict the IT budget of companies to identify potential new clients. They would like your help building and deploying the model. The only data you have on hand is a sample of companies across the United States, which includes their IT budget for last year, their total revenue last year, their total number of employees last year, and their industry classification. This data will make up your database of potential neighbors. Ultimately, as a first true test of the model you want predict the IT budget for Acme Corp., a potential client for whom you do not know their IT budget (but you know their total revenue, number of employees, and industry classification). A) Given the information above, explain how you could estimate Acme's IT budget using k- NN. B) If you chose X-N, the total number of training examples, what would be the effect?
Expert Answer:
Answer rating: 100% (QA)
To estimate Acme Corps IT budget using kNN I would 1 Collect the available data on the sample of exi... View the full answer
Related Book For
Income Tax Fundamentals 2013
ISBN: 9781285586618
31st Edition
Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill
Posted Date:
Students also viewed these economics questions
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
2. For every $1.00 spent on Advertising, the Industry generates $10.00 in Net Sales. You calculate Living Earth's Advertising relationship to Net Sales (show work) and report your comparison of...
-
Geraths Windows manufactures and sells custom storm windows for three-season porches. Geraths also provides installation service for the windows. The installation process does not involve changes in...
-
Determine the value of a preferred stock.
-
Bert C. Roberts Jr. was chairman of WorldComs board of directors. Immediately before that, he had been chairman of MCI, which WorldCom acquired on September 14, 1998, in a transaction valued at...
-
Garcia Corporation recently hired a new accountant with extensive experience in accounting for partnerships. Because of the pressure of the new job, the accountant was unable to review what he had...
-
A and has two danghters and one son. She has puast been diagnosed with a "ecucrence sf the lobalar breast cances treated with a humpectomy and adiation 4 years ago. Her father (Aaron) s still living...
-
Show the contents in hexadecimal of registers PC, AR, DR, IR, and SC of the basic computer when an ISZ indirect instruction is fetched from memory and executed. The initial content of PC is 7FF. The...
-
Using a reputable website, find a detailed job posting that interests you. Be sure that the posting has enough information to help you in creating a cover letter and resum. Copy the job posting and...
-
(a) Use the information in the adjusted trial balance reported in Exercise 3-21 to compute the current ratio for Wilson Trucking. (b) Assuming Spalding (a competitor) has a current ratio of 1.5,...
-
Using Wilson Trucking Companys adjusted trial balance from Exercise 3-21, prepare its December 31 closing entries. Data from Exercise 3-21 Use the following adjusted trial balance at December 31 of...
-
Use the adjusted accounts for Stark Company from Exercise 3-16 to prepare the (1) Income statement (2) Statement of retained earnings for the year ended December 31 (3) Balance sheet at December 31....
-
In May 2018, the U.S. Energy Information Administration projected that the average retail price for regular-grade gasoline would be $2.79 per gallon for the remainder of the year. Do some research on...
-
Refer to E4-16. Required: Using the adjusted balances in E4-16, give the closing journal entry for the year ended December 31. Data From Exercise 4-16: Mint Cleaning Inc. prepared the following...
-
Proceed as in Example 3 in Section 6.1 to rewrite the given expression using a single power series whose general term involves -1+ n-1 k= 0
-
B.) What is the approximate concentration of free Zn 2+ ion at equilibrium when 1.0010 -2 mol zinc nitrate is added to 1.00 L of a solution that is 1.080 M in OH - . For [Zn(OH) 4 ] 2- , K f = 4.610...
-
David and Darlene Jasper have one child, Sam, who is 6 years old. The Jaspers reside at 4639 Honeysuckle Lane, Los Angeles, CA 90248. David's Social Security number is 577-11-3311, Darlene's is...
-
Please answer the following questions regarding the taxability of Social Security: a. A 68-year-old taxpayer has $20,000 in Social Security income and $100,000 in tax-free municipal bond income. Does...
-
Harold Conners (Social Security number 785-23-9873) lives at 13234 DeMilo Drive, Houston, TX 77052, and is self-employed for 2012. He estimates his required annual estimated tax payment for 2012 to...
-
There is another possible explanation for purchased goodwill appearing in a sole proprietor's statement of financial position. What do you think it might be?
-
Why do the assets need to be revalued in these cases? The business has not been sold.
-
The shown partners have always shared profits and losses in the ratio: Holt 4; Stott 2: Young 1. From 1 January the assets were to be revalued as the profit sharing ratios are to be altered soon. The...
Study smarter with the SolutionInn App