Question: Linear regression with a specific, given dataset a. Download the original Diabetes Dataset from scikit-learn, to your Google Colab document (https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset). b. Write in a
Linear regression with a specific, given dataset
a. Download the original Diabetes Dataset from scikit-learn, to your Google Colab document (https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset).
b. Write in a Text block the dimensionality of the loaded data and describe the dimensionality of each sample (number of features).
c. Pre-processes the file to obtain: X (attributes) and Y (target).
d. Choose 5 attributes and plot attribute vs. target (5 plots). Select 4 attributes and generate a scatter plot for each pair. Provide some comments about the distribution of the target as a function of the attribute.
e. Generate a statistical description of the data using Pandas and print only the Mean, Std, Min and Max.
f. Calculate and print the value and provide some comments about the linearity of the 2 data.
g. Print the first 20 instances DESC sorted by the "Age" attribute. 8. Generate a Split of the data as follows: Training (80%), Validation (10%) and Testing (10%).
h. Create a linear regression model (sklearn) and train it using the following data splits:
1. Use 50% for training and 50% for testing
2. Use 70% for training and 30% for testing
3. Use 95% for training and 5% for testing
i. Propose two metrics to evaluate the models and give a brief justification of your selection.
j. Plot the result of one the metrics results in h.1, h.2 and h.3. Provide some comments on the results.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
