Question: 2: Diamonds Dataset In Problem 2, you will be working with the Diamonds Dataset. This dataset contains information about several thousand diamonds sold in the
2: Diamonds Dataset In Problem 2, you will be working with the Diamonds Dataset. This dataset contains information about several thousand diamonds sold in the United States. You can find more information about this dataset, including a description of its columns, here: Diamonds Dataset. Load the data stored in the tab-delimited file diamonds.txt into a DataFrame named diamonds. Use head() to display the first 5 rows of this DataFrame. Our goal in this problem will be to create a linear regression model to estimate the price of a diamond based only on its carat size. You will create a model that uses the qualitative variables cut, color, and clarity in a future assignment. We have observed in a previous assignment that there is an approximately linear relationship between the natural logarithm of price and the natural logarithm of carat. Add two new columns to diamonds. The new columns should be named ln_carat and ln_price, and should contain the natural logarithms of the carat and price columns. Use head() to display the first 5 rows of diamonds. We will create scatterplots to confirm that ln_carat and ln_price have an approximately linear relationship. Create two side-by-side scatterplots. The first scatter plot
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
