Question: CIS 2 3 3 4 Semester Project Background Information The marine biologists research team are satisfied with the Excel application that you have developed, it

CIS 2334 Semester Project
Background Information
The marine biologists research team are satisfied with the Excel application that you have developed, it helped them greatly understand the abalone across the country. In addition to the analysis you have done in part 2, the scientists are interested in finding any additional underlying patterns within the abalone data set. In other words, the research team wants to build mathematical model(s) that could reveal the fundamental relationships among the variables in the abalone data set.
To build a solid model, you need to go through the following steps and finalize your models in the end.
Note: You need to install (add-ins) Analysis ToolPak to be able to do this project. It will appear as Data Analysis in Data navigation bar. Also, add proper titles for each worksheet.
Project Tasks
Task 1. Prepare the dataset (10 points)
Firstly, you need to prepare the data for building the models. In classic data modeling tasks, you only use a portion of the data to train your model this portion of the data is called the training set; the rest of the data is used to evaluate the performances of your models this is called the test set.
What you need to do:
a. Create a new excel file called Firstname_Lastname_DataModeling.xlsx.
b. Name your current worksheet Original Data.
c. Copy the data in your Personal Data worksheet from your semester Project Part 2 and paste the data set in the Original Data worksheet.
d. Create a new worksheet called Training set and copy the first 2/3rds of the data from the Original data and paste them in the Training set worksheet.
e. Create a new worksheet called Test set and copy the remaining data (1/3 of the data) from the Original data and paste them in the Test set worksheet.
Task 2. Find relationships among variables in stacked data
(15 points)
Before modeling the data, you need to have a better understanding of the relationship among the variables. The research team have specified a set of numerical variables that they care the most about. These numerical variables are listed in the table below. In particular, the scientists are mostly interested in the rings of the abalone since it tells the age of the abalones.
Hint: Use Descriptive Statistics in Data Analysis and check Labels in First Row to describe/analyze the characteristics of each data. Remember to select data title in Input Range.
Length Diameter Height Whole_weight Shucked_weight Viscera_weight Shell_weight Rings
What you need to do:
a. Create a new worksheet called Stacked data analysis.
b. Using the Training set, explore and create Histograms for different variables listed in the previous table and then pick the 3most interesting histograms and describe/analyze the characteristics of each of them.
c. Using the Training set, create a Box Plot for Shucked_weight, Viscera_weight and Shell_weight and describe/analyze the characteristics of each variables.
d. Using the Training set, explore and create Scatter Plots for different pairs of variables listed in the previous table, and then pick the 5 most interesting scatter plots, describe and analyze the characteristics of each of them. Hint: you can pick one variable as an instance and pair it with the rest of variables.
e. Using the Training set, calculate the correlations between each pair of variables listed above. Identify the 5 correlations. Apply conditional formatting to highlight these 5 strongest correlations.
f. Use the Scatter Plots to illustrate and verify the 5 strongest correlations. Comment on your findings.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!