The dataset you will use is a CSV file that contains housing data collected from the...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
The dataset you will use is a CSV file that contains housing data collected from the 1990 California census. You can find this dataset on Kaggle at https://www.kaggle.com/datasets/cammugent/california- housing-prices. Homework Steps The homework is out of 30 total points, all shown here below. 1. Load the dataset into a Pandas DataFrame. (1 pts) 2. Print out all the data in the first 5 and last 5 rows. (2 pts) 3. Print out the type, Non-Null count, and name of each column. (2 pts) 4. Write at least 3 of the potential dataset biases or issues you could see persisting, in a multi-line comment at the top of your program. (2 pts) 5. Print the count, mean, standard deviation, minimum, 25%, 50%, 75%, and maximum values of all the radial data. (2 pts) 6. Print out the count, unique count, mode, and frequency of the mode of the ordinal and nominal data (all have the Pandas type as object) (1 pt) 7. Plot a histogram of each numerical value in a single figure using subplots, or the hist() function in Pandas (much easier option of the two, needs just one line). (2 pt) 8. Print the mean "median house value" grouped by the "median age" in sorted order. (2 pt) 9. Plot every latitude and longitude in a scatter plot. (2 pts) 10. Using Seaborn, plot a histogram of "median house value with columns for each value of "ocean proximity". (1 pt) 11. Plot every latitude and longitude in a scatter plot, with the point weight being their value for "median house value". (1 pt) 12. Create a temporary DataFrame, and remove all rows that are not <1H Ocean". It should now be the only category remaining in this temporary DataFrame. Now, scatter plot all of these latitudes and longitudes, with the "median house value" as the point weight. (2 pts) 13. Re-using the temporary DataFrame made in the last question, apply a mask to remove any "median income values under 300,000. Plot the now masked latitude and longitudes as a scatter plot, again with point weight being the "median house value". You will no longer use the temporary DataFrame after this question. (2 pts) 14. Create and add a new column to the original DataFrame that is the "rooms per household", by using the "total bedrooms" and "households" columns. Now, create a second new column for the "rooms per population" by using the "total rooms" and "population" columns, to see how many rooms each person gets on average. (2 pts). 15. Print the correlation matrix, in ascending order, of the "median hose value". (2 pts) 16. Remove the "total bedrooms column completely from the DataFrame. (2 pts) 17. Finally, remove the 4 columns you find least important to predict the "median house value" You should in total have 7 total columns left, including "median house value". (2 pts) The dataset you will use is a CSV file that contains housing data collected from the 1990 California census. You can find this dataset on Kaggle at https://www.kaggle.com/datasets/cammugent/california- housing-prices. Homework Steps The homework is out of 30 total points, all shown here below. 1. Load the dataset into a Pandas DataFrame. (1 pts) 2. Print out all the data in the first 5 and last 5 rows. (2 pts) 3. Print out the type, Non-Null count, and name of each column. (2 pts) 4. Write at least 3 of the potential dataset biases or issues you could see persisting, in a multi-line comment at the top of your program. (2 pts) 5. Print the count, mean, standard deviation, minimum, 25%, 50%, 75%, and maximum values of all the radial data. (2 pts) 6. Print out the count, unique count, mode, and frequency of the mode of the ordinal and nominal data (all have the Pandas type as object) (1 pt) 7. Plot a histogram of each numerical value in a single figure using subplots, or the hist() function in Pandas (much easier option of the two, needs just one line). (2 pt) 8. Print the mean "median house value" grouped by the "median age" in sorted order. (2 pt) 9. Plot every latitude and longitude in a scatter plot. (2 pts) 10. Using Seaborn, plot a histogram of "median house value with columns for each value of "ocean proximity". (1 pt) 11. Plot every latitude and longitude in a scatter plot, with the point weight being their value for "median house value". (1 pt) 12. Create a temporary DataFrame, and remove all rows that are not <1H Ocean". It should now be the only category remaining in this temporary DataFrame. Now, scatter plot all of these latitudes and longitudes, with the "median house value" as the point weight. (2 pts) 13. Re-using the temporary DataFrame made in the last question, apply a mask to remove any "median income values under 300,000. Plot the now masked latitude and longitudes as a scatter plot, again with point weight being the "median house value". You will no longer use the temporary DataFrame after this question. (2 pts) 14. Create and add a new column to the original DataFrame that is the "rooms per household", by using the "total bedrooms" and "households" columns. Now, create a second new column for the "rooms per population" by using the "total rooms" and "population" columns, to see how many rooms each person gets on average. (2 pts). 15. Print the correlation matrix, in ascending order, of the "median hose value". (2 pts) 16. Remove the "total bedrooms column completely from the DataFrame. (2 pts) 17. Finally, remove the 4 columns you find least important to predict the "median house value" You should in total have 7 total columns left, including "median house value". (2 pts)
Expert Answer:
Answer rating: 100% (QA)
Solution 1 Load the dataset into a Pandas DataFrame import pandas as pd data pdreadcsvcaliforniahousingpricescsv 2 Print out all the data in the first 5 and last 5 rows printdatahead printdatatail 3 P... View the full answer
Related Book For
Introduction To Management Science and Business Analytics A Modeling And Case Studies Approach With
ISBN: 9781260716290
7th Edition
Authors: Frederick S. Hillier, Mark S. Hillier
Posted Date:
Students also viewed these programming questions
-
Alnuaimi company tries to appraise the following two projects using appraisal method: Year Cashflow (A) Cashflow (B) -80,000 -100,000 1 45,000 10,000 2 25,000 40,000 3 20,000 45,000 4 10,000 120,000...
-
III. a) Apply the subnet mask 255.255.252.0 to the IP address 192.105.103.211 b) Apply the subnet mask 255.255.255.192 to the IP address 131.200.105.145
-
Lopez Sales Company had the following balances in its accounts on January 1, Year 2: Cash Merchandise Inventory Land Common Stock Retained Earnings $ 66,000 46,000 106,000 86,000 132,000 es Lopez...
-
Compensation System Change Introduction As Melanie Griffith gazed through the window of her office, she could see some employees walking to the parking lot to get to their cars; others were on their...
-
A beam of 188-eV electrons is directed at normal incidence onto a crystal surface as shown in Fig. 39.4b. The m = 2 intensity maximum occurs at an angle = 60.6. (a) What is the spacing between...
-
Demand in each period follows the same normal distribution (i.e., there is one demand distribution that represents demand in any single period). Assuming demand is independent across periods, which...
-
Consider the model \[ y=\theta_{1}-\theta_{2} e^{-\theta_{3} x}+\varepsilon \] This is called the Mitcherlich equation, and it is often used in chemical engineering. For example, \(y\) may be yield...
-
See if you can determine the amount of Bluebird State Banks current net income after taxes from the figures below (stated in millions of dollars) and the amount of its retained earnings from current...
-
How do conflict resolution professionals navigate ethical dilemmas when mediating disputes between stakeholders with competing interests, balancing impartiality with the pursuit of equitable outcomes?
-
A project is estimated to complete in 6 months. Refer to the table below. The total budgeted cost is $ till completion of the whole project. The cumulative budgeted cost is $ at the end of month 4....
-
Determine all values of k for which the following matrices are linearly Independent in M 22. [12] [4 0 k 2 14
-
Amount issued $ 290 million Offered Issued at a price of 98.75% plus accrued interest (proceeds to company 98.717%) through Citi and JPMorgan. Interest 9.25% per annum payable June 15 and December...
-
Bank of Skagit issues 10-year bonds with a maturity date of 2032 and an annual simple interest rate of 5.8% The interest is paid semi-annually. Clayton buys a $10,000 bond. Determine the following:...
-
The Johnson Company is computing diluted earnings per share for the current year. The company has convertible bonds outstanding that have been judged as being anti-dilutive. What is the significance...
-
Should I include the financial ratio information in this document in the plan to the board of directors?
-
To determine a product selling price based on the total costmethod, management should include: Total production and nonproduction costs plus a markup. Total production and nonproduction costs only....
-
The figure shows a bolted lap joint that uses SAE grade 8 bolts. The members are made of cold-drawn AISI 1040 steel. Find the safe tensile shear load F that can be applied to this connection if the...
-
What are the assumptions of linear programming that need to be checked to evaluate the adequacy of using a linear programming model to represent the problem under consideration?
-
What is the specific question being addressed when testing the validity of a simulation model?
-
What is the identifying feature of a costbenefittrade-off problem?
-
The details in Figure 16.9 relate to D Co. Using that information and appropriate ratios, prepare a financial report on the company. The opening inventory value figures were :135,000 20X1 actual and...
-
The process of expressing a foreign subsidiarys balance sheet in its parents currency is called: A. Currency translation. B. Currency change. C. Currency conversion. D. Currency swap.
-
A loan is made to a company of $20,000, which is equal to :10,000 at the date of the loan during year 1. The loan is legally denominated in dollars. At the end of year 1, the loan is translated as...
Study smarter with the SolutionInn App