Question: CIS 2 3 3 4 Semester Project Background Information The marine biologists research team are satisfied with the Excel application that you have developed, it
CIS Semester Project
Background Information
The marine biologists research team are satisfied with the Excel application that you have developed, it helped them greatly understand the abalone across the country. In addition to the analysis you have done in part the scientists are interested in finding any additional underlying patterns within the abalone data set. In other words, the research team wants to build mathematical models that could reveal the fundamental relationships among the variables in the abalone data set.
To build a solid model, you need to go through the following steps and finalize your models in the end.
Note: You need to install addins Analysis ToolPak to be able to do this project. It will appear as Data Analysis in Data navigation bar. Also, add proper titles for each worksheet.
Project Tasks
Task Prepare the dataset points
Firstly, you need to prepare the data for building the models. In classic data modeling tasks, you only use a portion of the data to train your model this portion of the data is called the training set; the rest of the data is used to evaluate the performances of your models this is called the test set.
What you need to do:
a Create a new excel file called FirstnameLastnameDataModeling.xlsx
b Name your current worksheet Original Data
c Copy the data in your Personal Data worksheet from your semester Project Part and paste the data set in the Original Data worksheet.
d Create a new worksheet called Training set and copy the first rds of the data from the Original data and paste them in the Training set worksheet.
e Create a new worksheet called Test set and copy the remaining data of the data from the Original data and paste them in the Test set worksheet.
Task Find relationships among variables in stacked data
points
Before modeling the data, you need to have a better understanding of the relationship among the variables. The research team have specified a set of numerical variables that they care the most about. These numerical variables are listed in the table below. In particular, the scientists are mostly interested in the rings of the abalone since it tells the age of the abalones.
Hint: Use Descriptive Statistics in Data Analysis and check Labels in First Row to describeanalyze the characteristics of each data. Remember to select data title in Input Range.
Length Diameter Height Wholeweight Shuckedweight Visceraweight Shellweight Rings
What you need to do:
a Create a new worksheet called Stacked data analysis
b Using the Training set explore and create Histograms for different variables listed in the previous table and then pick the most interesting histograms and describeanalyze the characteristics of each of them.
c Using the Training set, create a Box Plot for Shuckedweight, Visceraweight and Shellweight and describeanalyze the characteristics of each variables.
d Using the Training set explore and create Scatter Plots for different pairs of variables listed in the previous table, and then pick the most interesting scatter plots, describe and analyze the characteristics of each of them. Hint: you can pick one variable as an instance and pair it with the rest of variables.
e Using the Training set calculate the correlations between each pair of variables listed above. Identify the correlations. Apply conditional formatting to highlight these strongest correlations.
f Use the Scatter Plots to illustrate and verify the strongest correlations. Comment on your findings.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
