Question: DSO528 - Group Project - Phase B. Refining your Project Objective and Selecting the Training and Testing Data set. Dear Students, Here is the next
DSO528 - Group Project - Phase B.
Refining your Project Objective and Selecting the Training and Testing Data set.
Dear Students, Here is the next step for Project.
Phase A - Selection of Two Data Sets - Done.
Phase B - Refining the Project Objective, the Training and Testing Dataset.
In this phase B, I want you to do the following,
Step 0: Decide which dataset you want to use for your project.
Step1: Get the Descriptive Statistics for your Y variable. You need to have between 5% to 15% of 1's (or 0's) in your data set, if not you may have to re-sort the original data by one (or more) of the predictors and then select a subset of 2000 rows for study. You can use 3-D scatter plot or Decision Tree to find out in which region you have high % or 1's or 0's.
Please note, this is learning experience, you need to have "Y" between 5% to 15% 1's (or 0's) in your data set to build a reasonable model for learning.
For example, you may have a fraud data set, let us say the percentage of fraud is 1% for the entire sample data, you can sort the dataset by transaction amount and let us say the large transactions have high fraud rate (identified by Decision Tree or 3D scatter plot), then your new objective will be to build model to predict fraud in high value transactions compared to build model to predict fraud.
Step2: Select a random set of 2000 data tuple from your current dataset(s), using Rand() function in Excel to select a subset. Using JMP divide them into Training data set and Testing data set. Make sure you have comparable percentage in Training and Testing for Y. If not randomize it again or randomize the 1's separately, randomize the 0's separately, split each of them into two parts and merge top part of 1's with top part of 0's to get the training data set (1000 rows) and similarly the bottom part to get testing dataset.
Step3: Select the Y variable (it has to be qualitative and Binary) and Select a set of 4 quantitative variables (use business Logic, Which variables you think will be the best predictors of Y variable).
Step4: Get the Descriptive Statistics using JMP for all the 5 variables and study it. If you have a lot of outliers or highly skewed X's (predictors) you may have future problems in building a Business Model.
Step5: Run a stepwise logistic regression on your current dataset, and build a model and check R-sq, if the R-sq value is less than 1%, you may want to switch to dataset B and repeat the process or refine your project objective.
Step6: Select the dataset you want to use for your project and submit 2- page output in Blackboard and hardcopy.
1. Explain the New Project Proposal (one Paragraph).
2. Explain the 5 - variables Descriptive Analytics in 2 or 3 lines per variables.
3. Provide the R-sq for your stepwise Model.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
