Remove missing values from your dataset(s) before answering the following questions. 1) Use mobile.csv dataset that...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Remove missing values from your dataset(s) before answering the following questions. 1) Use mobile.csv dataset that classifies mobile phones with respect to their prices into four categories (price_range column). Use 70% of the dataset for training and the remaining as the test set. a. Generate a decision tree classifier that uses gini index as the impurity measure with a maximum depth of 3. i. Plot the resulting decision tree obtained after training the classifier. ii. Compute the accuracy on the test set. iii. Plot the accuracy for different depth values (try values between 2 and 20) for both training and test. Do you observe overfitting as depth increases? b. Generate a k-nn classifier for different number of nearest neighbors, k (hyperparameter). Consider k values between 2 and 15. Plot the accuracies. C. Generate a support vector machine with a linear kernel where hyperparameter C can get values [0.001, 0.01, 0.1, 1]. Plot the accuracies for different C values. d. Generate ensemble classifiers, bagging, boosting, and adaboost, with 150 base- classifiers with a maximum depth of 5. Plot the accuracies for both training and test. Which one performs better than the others? 2) Use California Housing Dataset (data1.csv) which contains data drawn from the 1990 U.S. Census and apply K-means clustering using longitude, latitude, and median income columns. a. Pick k and form k clusters by assigning each instance to its nearest centroid. b. Compute the centroid of each cluster. C. Is the k that you pick good enough? Determine the number of clusters (k) in the data using sum-of-square-errors. 3) Use the first 50 row in the Customer Segmentation Dataset (data2.csv) and apply three hierarchical clustering algorithms provided by the Python scipy library: (1) single link (MIN), (2) complete link (MAX), and (3) group average and plot the dendograms. 4) Use data2.csv dataset (use every row) and apply DBSCAN algorithm. Consider annual income as the x axis and spending score as the y axis and plot the clusters. Remove missing values from your dataset(s) before answering the following questions. 1) Use mobile.csv dataset that classifies mobile phones with respect to their prices into four categories (price_range column). Use 70% of the dataset for training and the remaining as the test set. a. Generate a decision tree classifier that uses gini index as the impurity measure with a maximum depth of 3. i. Plot the resulting decision tree obtained after training the classifier. ii. Compute the accuracy on the test set. iii. Plot the accuracy for different depth values (try values between 2 and 20) for both training and test. Do you observe overfitting as depth increases? b. Generate a k-nn classifier for different number of nearest neighbors, k (hyperparameter). Consider k values between 2 and 15. Plot the accuracies. C. Generate a support vector machine with a linear kernel where hyperparameter C can get values [0.001, 0.01, 0.1, 1]. Plot the accuracies for different C values. d. Generate ensemble classifiers, bagging, boosting, and adaboost, with 150 base- classifiers with a maximum depth of 5. Plot the accuracies for both training and test. Which one performs better than the others? 2) Use California Housing Dataset (data1.csv) which contains data drawn from the 1990 U.S. Census and apply K-means clustering using longitude, latitude, and median income columns. a. Pick k and form k clusters by assigning each instance to its nearest centroid. b. Compute the centroid of each cluster. C. Is the k that you pick good enough? Determine the number of clusters (k) in the data using sum-of-square-errors. 3) Use the first 50 row in the Customer Segmentation Dataset (data2.csv) and apply three hierarchical clustering algorithms provided by the Python scipy library: (1) single link (MIN), (2) complete link (MAX), and (3) group average and plot the dendograms. 4) Use data2.csv dataset (use every row) and apply DBSCAN algorithm. Consider annual income as the x axis and spending score as the y axis and plot the clusters.
Expert Answer:
Related Book For
The Legal Environment of Business A Critical Thinking Approach
ISBN: 978-0132664844
6th Edition
Authors: Nancy K Kubasek, Bartley A Brennan, M Neil Browne
Posted Date:
Students also viewed these programming questions
-
Consider two very long, straight fins of uniform cross section, as shown in Figure 3.17. The rectangular fin has dimensions t = 1 mm and w = 20 mm. The circular pin fin has the same cross-sectional...
-
QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...
-
You have been asked to calculate the cost of capital of Mulligan Ltd, a company which specialises in developing new medical equipment products. The following information is available regarding the...
-
Which of the following statements is FALSE? a. Because value is lost when a resource is used by another project, we should include the opportunity cost as an incremental cost of the project. b. Sunk...
-
At a certain college it is estimated that at most 25% of the students ride bicycles to class. Does this seem to be a valid estimate if, in a random sample of 90 college students, 28 are found to ride...
-
Figure 27.42 shows schematically part of the apparatus used in 1897 by J. J. Thomson to determine the charge-tomass ratio of the electron. A beam of electrons, all moving at the same speed \(v\),...
-
On January 4, 2014, Glennside Co. paid $235,000 for a computer system. In addition to the basic purchase price, the company paid a setup fee of $1,100, $6,200 sales tax, and $37,200 for a special...
-
(d) For the following Boolean expression, give: i. The truth table F(a, b, c, d) = ( + b.d).(c.b.a. + .d) ii. The Karnaugh map iii. The minimal sum of products expression
-
Kalogridis Corp. manufactures industrial dye. The company is preparing its 2011 master budget and has presented you with the following information: a. The projected December 31, 2010, balance sheet...
-
Suppose the two doctors play a one shot game that is they interact only once and never again What will be the Nash non cooperative equilibrium pricing strategy in this one shot game
-
Compute the probability of the event E = {1, 2}. Let the sample space be S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Suppose the outcomes are equally likely.
-
log(x 6 y 2 ) Write each expression as a sum of logs. Express powers as factors.
-
6 1.5 = y Change each exponential expression to an equivalent expression involving a logarithm.
-
log 3 81 = 4 Change each logarithmic expression to an equivalent expression involving an exponent.
-
log 18 Use a calculator to evaluate each expression. Round your answers to three decimal places.
-
QI: a) Differentiate between Mark-up and Gross margin. (4) b) Fill in the blank spaces in the following: (6) % Mark-up on sales 15% Selling price Rs. 325 Cost ...** ii) Rs. 2160 28% Cost Rs.250 %...
-
How does the organizational structure of an MNC influence its strategy implementation?
-
Sam was driving in excess of the speed limit and ran a red light at 11:00 P.M. He hit Suzannes car, which was crossing the intersection when he ran the red light. Sam had not seen Suzannes car...
-
Explain relationships between trademarks and trade secrets.
-
On behalf of T.B., the unwed mother of a minor child, the State of Alabama filed a complaint for paternity and child support against J.E.B. A panel of 12 males and 24 females was called by the court...
-
Consider the feedback system shown in Figure 10.26. a. Using Routh's stability criterion, determine the range of the control gain \(K\) for which the closed-loop system is stable. b. Use MATLAB...
-
Figure 10.40 shows a negative feedback control system. a. Design a P controller such that the damping ratio of the closed-loop system is 0.5 . b. Estimate the rise time, overshoot, and \(2 \%\)...
-
The transfer function of a dynamic system is given by \[G(s)=\frac{s+1}{4 s^{4}+5 s^{3}+2 s^{2}+s+6} \] a. Using Routh's stability criterion, determine the stability of the system. b. Using MATLAB,...
Study smarter with the SolutionInn App