Question: Bonus Problem (10 points) Use the Sacremento data from the caret library by running data(Sacremento) after loading caret. This data is about housing prices in

Bonus Problem (10 points) Use the Sacremento data from the caret library by running data(Sacremento) after loading caret. This data is about housing prices in Sacramento, California. Remove the zip and city variables. a. Explore the variables to see if they have reasonable distributions and show your work. We will be predicting the type variable does that mean we have a class imbalance? b. There are lots of options for working on the data to try to improve the performance of SVM, including (1) removing other variables that you know should not be part of the prediction, (2) dealing with extreme variations in some variables with smoothing, normalization or a log transform, (3) applying PCA, and (4) to removing outliers. Pick one now and continue. c. Use SVM to predict type and use grid search to get the best accuracy you can. The accuracy may be good, but look at the confusion matrix as well. Report what you find. Note that the kappa value provided with your SVM results can also help you see this. It is a measure of how well the classifier performed that takes into account the frequency of the classes. d. Return to (b) and try at least one other way to try to improve the data before running SVM again, as in (c). e. In the end, some data are just so imbalanced that a classifier is never going to predict the minority class. Dealing with this is a huge topic. One simple possibility is to conclude that we do not have enough data to support predicting the very infrequent class(es) and remove them. If they are not actually important to the reason we are making the prediction, that could be fine. Another approach is to force the data to be more even by sampling. Create a copy of the data that includes all the data from the two smaller classes, plus a small random sample of the large class (you can do this by separating those data with a filter, sampling, then attaching them back on). Check the distributions of the variables in this new data sample to make sure they are reasonably close to the originals using visualization and/or summary statistics. We want to make sure we did not get a strange sample where everything was cheap or there were only studio apartments, for example. You can rerun the sampling a few times if you are getting strange results. If it keeps happening, check your process. Use SVM to predict type one this new, more balanced dataset and report its performance with a confusion matrix and with grid search to get the best accuracy.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Use the Sacremento data from the caret library by running data(Sacremento) after loading caret. This datais about housing prices in Sacramento, California. Remove the zip and city variables.

Read the case study and fill out the SWOT Analysis. ***NO OUTSIDE RESEARCH*** Best Buy Co., Inc. June 11, 2019. Best Buy (BBY) appoints Core Barry as CEO from her prior CFO position to replace Hubert...

Create an annotated bibliography from the three core readings (provided); ? Each annotation (summary & comment) should be approximately 300 words (per reading); ? Each annotation should include a...

BONUS PROBLEM (10 points) Absolutely no partial credit will be given for this item. If your final answer is correct and you have shown your work, you will get full points No points will be awarded...

Fill in the incomplete income statement and incomplete variable costing income statement. Also provide the following amounts: 1. Gross Margin 2. Goods available for sale 3. Direct labor hours 4....

16. Bonus Problem (+10 points): Suppose your company is in such a situation, where it has two assets, which are $1 billion of cash and $4 billion of collateralized debt obligations (CDOs). In the...

Bonus question: (10 points) Use two-level Einstein coefficients to show that stimulated emission can only occur when there is population inversion. Sketch two common lasing schemes (Hint: use multi-...

Bonus Problem ( 5 points ) Use problem I as a reference for information related to this problem ... Suppose Lucky Industries is considering issuing a callable bond with warrants instead of issuing a...

Supply Chain Modeling with Excel Bonus Problem. (10 points) Seahawk University maintains a powerful mainframe computer for research use by its faculty, Ph.D. students, and research associates. During...

Discrete Math Fall 2017 Homework 7 Due: Friday, November 10, 2017 The below problems involve material from Chapter 5. You must type your solutions. Submit a single PDF file on eLC by 9PM on the given...

The air-release flap on a hot-air balloon is used to release hot air from the balloon when appropriate. On one hot-air balloon, the air release opening has an area of 0.5 m2, and the filling opening...

An ideal gas with 3.00 mol is initially in state 1 with pressure p1 = 20.0 atm and volume V1 = 1500 cm3. First it is taken to state 2 with pressure p2 = 1.50p1 and volume V2 = 2.00V1. Then it is...

On March 3 1 , 2 0 2 5 the ACME Company had 2 0 0 0 units in their factory. These units were complete with respect to materials and . 5 complete with respect to conversion. This work - in - process...

Brief Exercise 8-12 (Algo) Inventory cost flow methods; perpetual system [LO8-4] Salt and Mineral (SAM) began 2024 with 360 units of its one product. These units were purchased near the end of 2023...

=+1 What type of adjustment (differential) will they need to pay to make up for the

=+6 Who is the peer of the IA ?

=+herself to in terms of equity with regard to this assignment?