The dataset you will use is a CSV file that contains housing data collected from the...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
The dataset you will use is a CSV file that contains housing data collected from the 1990 California census. You can find this dataset on Kaggle at https://www.kaggle.com/datasets/cammugent/california- housing-prices. Homework Steps The homework is out of 30 total points, all shown here below. 1. Load the dataset into a Pandas DataFrame. (1 pts) 2. Print out all the data in the first 5 and last 5 rows. (2 pts) 3. Print out the type, Non-Null count, and name of each column. (2 pts) 4. Write at least 3 of the potential dataset biases or issues you could see persisting, in a multi-line comment at the top of your program. (2 pts) 5. Print the count, mean, standard deviation, minimum, 25%, 50%, 75%, and maximum values of all the radial data. (2 pts) 6. Print out the count, unique count, mode, and frequency of the mode of the ordinal and nominal data (all have the Pandas type as object) (1 pt) 7. Plot a histogram of each numerical value in a single figure using subplots, or the hist() function in Pandas (much easier option of the two, needs just one line). (2 pt) 8. Print the mean "median house value" grouped by the "median age" in sorted order. (2 pt) 9. Plot every latitude and longitude in a scatter plot. (2 pts) 10. Using Seaborn, plot a histogram of "median house value with columns for each value of "ocean proximity". (1 pt) 11. Plot every latitude and longitude in a scatter plot, with the point weight being their value for "median house value". (1 pt) 12. Create a temporary DataFrame, and remove all rows that are not <1H Ocean". It should now be the only category remaining in this temporary DataFrame. Now, scatter plot all of these latitudes and longitudes, with the "median house value" as the point weight. (2 pts) 13. Re-using the temporary DataFrame made in the last question, apply a mask to remove any "median income values under 300,000. Plot the now masked latitude and longitudes as a scatter plot, again with point weight being the "median house value". You will no longer use the temporary DataFrame after this question. (2 pts) 14. Create and add a new column to the original DataFrame that is the "rooms per household", by using the "total bedrooms" and "households" columns. Now, create a second new column for the "rooms per population" by using the "total rooms" and "population" columns, to see how many rooms each person gets on average. (2 pts). 15. Print the correlation matrix, in ascending order, of the "median hose value". (2 pts) 16. Remove the "total bedrooms column completely from the DataFrame. (2 pts) 17. Finally, remove the 4 columns you find least important to predict the "median house value" You should in total have 7 total columns left, including "median house value". (2 pts) The dataset you will use is a CSV file that contains housing data collected from the 1990 California census. You can find this dataset on Kaggle at https://www.kaggle.com/datasets/cammugent/california- housing-prices. Homework Steps The homework is out of 30 total points, all shown here below. 1. Load the dataset into a Pandas DataFrame. (1 pts) 2. Print out all the data in the first 5 and last 5 rows. (2 pts) 3. Print out the type, Non-Null count, and name of each column. (2 pts) 4. Write at least 3 of the potential dataset biases or issues you could see persisting, in a multi-line comment at the top of your program. (2 pts) 5. Print the count, mean, standard deviation, minimum, 25%, 50%, 75%, and maximum values of all the radial data. (2 pts) 6. Print out the count, unique count, mode, and frequency of the mode of the ordinal and nominal data (all have the Pandas type as object) (1 pt) 7. Plot a histogram of each numerical value in a single figure using subplots, or the hist() function in Pandas (much easier option of the two, needs just one line). (2 pt) 8. Print the mean "median house value" grouped by the "median age" in sorted order. (2 pt) 9. Plot every latitude and longitude in a scatter plot. (2 pts) 10. Using Seaborn, plot a histogram of "median house value with columns for each value of "ocean proximity". (1 pt) 11. Plot every latitude and longitude in a scatter plot, with the point weight being their value for "median house value". (1 pt) 12. Create a temporary DataFrame, and remove all rows that are not <1H Ocean". It should now be the only category remaining in this temporary DataFrame. Now, scatter plot all of these latitudes and longitudes, with the "median house value" as the point weight. (2 pts) 13. Re-using the temporary DataFrame made in the last question, apply a mask to remove any "median income values under 300,000. Plot the now masked latitude and longitudes as a scatter plot, again with point weight being the "median house value". You will no longer use the temporary DataFrame after this question. (2 pts) 14. Create and add a new column to the original DataFrame that is the "rooms per household", by using the "total bedrooms" and "households" columns. Now, create a second new column for the "rooms per population" by using the "total rooms" and "population" columns, to see how many rooms each person gets on average. (2 pts). 15. Print the correlation matrix, in ascending order, of the "median hose value". (2 pts) 16. Remove the "total bedrooms column completely from the DataFrame. (2 pts) 17. Finally, remove the 4 columns you find least important to predict the "median house value" You should in total have 7 total columns left, including "median house value". (2 pts)
Expert Answer:
Answer rating: 100% (QA)
Solution 1 Load the dataset into a Pandas DataFrame import pandas as pd data pdreadcsvcaliforniahousingpricescsv 2 Print out all the data in the first 5 and last 5 rows printdatahead printdatatail 3 P... View the full answer
Related Book For
Introduction To Management Science and Business Analytics A Modeling And Case Studies Approach With
ISBN: 9781260716290
7th Edition
Authors: Frederick S. Hillier, Mark S. Hillier
Posted Date:
Students also viewed these programming questions
-
Alnuaimi company tries to appraise the following two projects using appraisal method: Year Cashflow (A) Cashflow (B) -80,000 -100,000 1 45,000 10,000 2 25,000 40,000 3 20,000 45,000 4 10,000 120,000...
-
III. a) Apply the subnet mask 255.255.252.0 to the IP address 192.105.103.211 b) Apply the subnet mask 255.255.255.192 to the IP address 131.200.105.145
-
Lopez Sales Company had the following balances in its accounts on January 1, Year 2: Cash Merchandise Inventory Land Common Stock Retained Earnings $ 66,000 46,000 106,000 86,000 132,000 es Lopez...
-
Compensation System Change Introduction As Melanie Griffith gazed through the window of her office, she could see some employees walking to the parking lot to get to their cars; others were on their...
-
Canyon Canoe Company is considering raising additional capital for further expansion. The company wants to finance a new business venture into guided trips down the Amazon River in South America....
-
Should marketing objectives be described in quantitative terms? Why or why not?
-
6. CPA QUESTION In order to negotiate bearer paper, one must: a. Indorse the paper b. Indorse and deliver the paper with consideration c. Deliver the paper d. Deliver and indorse the paper
-
Ellison Inc., a manufacturer of steel school lockers, plans to purchase a new punch press for use in its manufacturing process. After contacting the appropriate vendors, the purchasing department...
-
Identifica el proceso a ilustrar Elabora una lista de los pasos, actividades o subprocesos que conforman el proceso Establece qu se espera del proceso Formula preguntas claves de los subprocesos...
-
Forward exchange contract designated as a fair value hedge of a foreign-currency-denominated firm commitment to sell inventory, weakening $US Our U.S.-based company enters into a "firm commitment"...
-
The polynomial of degree 4, P(x), has a root of multiplicity 2 at x = 2 and roots of multiplicity 1 at -4. It goes through the point (5,324). x=0 and Find a formula for P(x). P(x)=()(x-2)x(x+4)
-
What authority does the internal auditing function have over the executive level of management? Over other levels? Why?
-
What are working paper review sheets? What information is likely to be included on a working paper review sheet?
-
Compare and contrast the independence and the objectivity of internal and external auditors.
-
More and more public sector agencies are focusing on how governments and nonprofits can best use social media strategies and efforts to interact with the public. Social media or new media directors...
-
What is the auditor's responsibility with respect to the performance evaluation process?
-
Which hypothesis below would suggest a critical region on the right side? Question 19 options: a) H0: heart beats among sedative users = 20 b) H1: heart beats among sedative users 20 c) H1: heart...
-
Integration is a vital concept when applied in one?s life. Integrating your life means making ideal choices. Perfect choices on the other go in line with quality decisions. Quality decisions lead to...
-
What are the assumptions of linear programming that need to be checked to evaluate the adequacy of using a linear programming model to represent the problem under consideration?
-
What is the specific question being addressed when testing the validity of a simulation model?
-
What is the identifying feature of a costbenefittrade-off problem?
-
On September 1, 1998, Susan Chao bought a motorcycle for \($10,000.\) She paid \($1,000\) down and financed the balance with a five-year loan at a stated annual interest rate of 9.6 percent,...
-
You have recently won the super jackpot in the Illinois state lottery. On reading the fine print, you discover that you have the following two options: a. You receive \($160,000\) at the beginning of...
-
Peter Green bought a \($15,000\) Honda Civic with 20 percent down and financed the rest with a four-year loan at 8 percent stated annual interest rate, compounded monthly. What is his monthly payment...
Study smarter with the SolutionInn App