Question 2 - Manipulating data [25 marks] (a) Two datasets are given below. Answer the following...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Question 2 - Manipulating data [25 marks] (a) Two datasets are given below. Answer the following questions, and give numerical answers in 2 decimal places. [7] Dataset A: (1, 3, 3, 5, 9) Dataset B: (1, 3, 3, 5, 9,99) (i) Calculate the mean, median, standard deviation of dataset A. (ii) Calculate the mean, median, standard deviation of dataset B. (iii) The difference between the two datasets is the additional outlier value 99 in dataset B. Which of the mean and the median does the existence of the outlier have a more severe impact on? (b) A dataset is given below. Answer the following questions in 2 decimal places. [6] Dataset A: (1, 3, 3, 5, 9) (i) Normalize the dataset to the range -1.0 to 1.0. (ii) Standardize the dataset to the mean 0.0 and the standard deviation 1.0. (c) Apply binning to the values (7, 1, 12, -6, 6, 5, 14, -1, 16, 4, 10, 7) using bins of size 3 and the means of the bins. Do it manually without programming; show the steps, the contents and replacement of the bins. [6] [6] (d) Two jobs of cleaning data are smoothing data and handling missing data. (i) Smoothing data, such as binning, typically discards some information from the data. Explain why smoothing data is useful in machine learning. (ii) Name two techniques of smoothing data other than binning. (iii) Two ways of handling missing data are removing records with missing values and filling in missing values. Suggest the criteria or condition of selecting between the two ways with reference to the missing values and the data. Question 2 - Manipulating data [25 marks] (a) Two datasets are given below. Answer the following questions, and give numerical answers in 2 decimal places. [7] Dataset A: (1, 3, 3, 5, 9) Dataset B: (1, 3, 3, 5, 9,99) (i) Calculate the mean, median, standard deviation of dataset A. (ii) Calculate the mean, median, standard deviation of dataset B. (iii) The difference between the two datasets is the additional outlier value 99 in dataset B. Which of the mean and the median does the existence of the outlier have a more severe impact on? (b) A dataset is given below. Answer the following questions in 2 decimal places. [6] Dataset A: (1, 3, 3, 5, 9) (i) Normalize the dataset to the range -1.0 to 1.0. (ii) Standardize the dataset to the mean 0.0 and the standard deviation 1.0. (c) Apply binning to the values (7, 1, 12, -6, 6, 5, 14, -1, 16, 4, 10, 7) using bins of size 3 and the means of the bins. Do it manually without programming; show the steps, the contents and replacement of the bins. [6] [6] (d) Two jobs of cleaning data are smoothing data and handling missing data. (i) Smoothing data, such as binning, typically discards some information from the data. Explain why smoothing data is useful in machine learning. (ii) Name two techniques of smoothing data other than binning. (iii) Two ways of handling missing data are removing records with missing values and filling in missing values. Suggest the criteria or condition of selecting between the two ways with reference to the missing values and the data.
Expert Answer:
Related Book For
Economics
ISBN: 978-0073375694
18th edition
Authors: Campbell R. McConnell, Stanley L. Brue, Sean M. Flynn
Posted Date:
Students also viewed these programming questions
-
Answer the following questions based on the information presented for Cloud 9 in Appendix B of this book and in the current and earlier chapters. You should also consider your answers to the case...
-
The following additional information is available for the Dr. Ivan and Irene Incisor family from Chapters 1-5. Ivan's grandfather died and left a portfolio of municipal bonds. In 2012, they pay Ivan...
-
What are the two(2) products dimensions that could be used if a position Map/Perception Map were to be created for VIA instant coffee?
-
In the chapter-opening photos Dean Baird is covered first with circular images of the Sun, then with crescent shaped images. Where in the sky is the Moon relative to the Sun when the images are...
-
Repeat Prob. 349 for a gage pressure of 45 kPa. Data from Problem 49 The gage pressure of the air in the tank shown in Fig. P349 is measured to be 65 kPa. Determine the differential height h of the...
-
The efficiency of a reversible machine is less than 50%. True or False
-
The data in file XR11100 are the weights (in grams) for random samples of grain packages filled by two different filling machines. The machines have a fine adjustment for the mean amount of fill, but...
-
This course is UNDC201, it includes concepts, definitions, legal provisions, different types and methods as well as markets for organized criminal activities of organized criminal groups....
-
The Production manager of the Good Time Wine Rack Company is retiring and the Managing Director is looking to replace her. You are preparing a work schedule as youare hoping to be offeredthe position...
-
Compute the determinant for the following matrices. Use either Laplace's algorithm, determinant properties or Gaussian elimination, but be sure to clearly show your work. 1) 3 (2) 044 1-1 4-1 () (0...
-
8 options: The black hole that resulted from the first black hole merger that was measured with gravitational waves has a mass of 53 MSun, which is 1.11032 kg. Calculate the Schwarzschild radius of...
-
Given the following information daculate the EAC and select from list below. Project is planned for 12 months PV $30 000 EV $26 000 AC $29 000 BAC $252 000 O $293,023.25 O $296,023.78 O $197, 043.96...
-
Write a better version of memo for Technical description of paperclip I'm attaching the memo i have written please right a better one in a perfect memo format MEMORANDUM To: From: Date: Subject:...
-
5. The sweet spot on a baseball bat is 120 cm from the axis of rotation during the swing of the bat. (This seems long at first, since the bat is shorter than 120 cm. But the bat's axis of rotation is...
-
Critical Performance Statement: Describe the purpose and importance of observation and pedagogical documentation to the profession of early childhood education and care. Apply ethical and...
-
Briefly discuss the factors that can shift the short-run aggregate supply curve but not the long-run aggregate supply curve. Explain your answer.
-
Which should drive action planning more, strengths or weaknesses? That is, is it more important to build on your strengths or to reduce your weaknesses? Explain.
-
Explain how a global-positioning antitheft device installed by one car owner can produce a positive spillover to thousands of others in a city.
-
Many of the lowest-paid people in societyfor example, short-order cooksalso have relatively poor working conditions. Hence, the notion of compensating wage differentials is disproved. Do you agree?...
-
How do the concepts of accounting profit and economic profit differ? Why is economic profit smaller than accounting profit? What are the three basic sources of economic profit? Classify each of the...
-
Amherst Metal Works produces two types of metal lamps. Amherst manufactures 20,000 basic lamps and 5,000 designer lamps. Its simple costing system uses a single indirect-cost pool and allocates costs...
-
Amherst Metal Works produces two types of metal lamps. Amherst manufactures 20,000 basic lamps and 5,000 designer lamps. Its activity-based costing system uses two indirect-cost pools. One cost pool...
-
How do managers refine a costing system?
Study smarter with the SolutionInn App