Question: For this chapters exercise, you will use logistic regression to try to predictwhether or not young people you know will eventually graduate from college.Complete the

For this chapters exercise, you will use logistic regression to try to predictwhether or not young people you know will eventually graduate from college.Complete the following steps:1) Open a new blank spreadsheet in OpenOffice Calc. At the bottom of thespreadsheet there will be three default tabs labeled Sheet1, Sheet2, Sheet3.Rename the first one Training and the second one Scoring. You can rename thetabs by double clicking on their labels. You can delete or ignore the third defaultsheet.2) On the training sheet, starting in cell A1 and going across, create attributelabels for five attributes: Parent_Grad, Gender, Income_Level, Num_Siblings, andGraduated.3) Copy each of these attribute names except Graduated into the Scoring sheet.4) On the Training sheet, enter values for each of these attributes for severaladults that you know who are at the age that they could have graduated fromcollege by now. These could be family members, friends and neighbors,coworkers or fellow students, etc. Try to do at least 20 observations; 30 or morewould be better. Enter husband and wife couples as two separate observations.Use the following to guide your data entry:a. For Parent_Grad, enter a 0 if neither of the persons parents graduated fromcollege, a 1 if one parent did, and a 2 if both parents did. If the persons parentswent on to earn graduate degress, you could experiment with making thisattribute even more interesting by using it to hold the total number of collegedegrees by the persons parents. For example, if the person represented in theobservation had a mother who earned a bachelors, masters and doctorate, and afather who earned a bachelors and a masters, you could enter a 5 in thisattribute for that person.b. For Gender, enter 0 for female and 1 for male.c. For Income_Level, enter a 0 if the person lives in a household with an incomelevel below what you would consider to be below average, a 1 for average, and a2 for above average. You can estimate or generalize. Be sensitive to others whengathering your datadont snoop too much or risk offending your data subjects.d. For Num_Siblings, enter the number of siblings the person has.e. For Graduated, put Yes if the person has graduated from college and No ifthey have not.5) Once youve compiled your Training data set, switch to the Scoring sheet inOpenOffice Calc. Repeat the data entry process for at least 20(more is better)young people between the ages of 0 and 18 that you know. You will use thetraining set to try to predict whether or not these young people will graduatefrom college, and if so, how confident you are in your prediction. Remember thisis your scoring data, so you wont provide the Graduated attribute, youll predictit shortly.6) Use the File > Save As menu option in OpenOffice Calc to save your Trainingand Scoring sheets as CSV files.7) Import your two CSV files into your RapidMiner respository. Be sure to givethem descriptive names. Alternatively, you can simply connect to them usingRead CSV operators.8) Add your two data sets to a new Main Process window. If you have preparedyour data well in OpenOffice Calc, you shouldnt have any missing or inconsistentdata to contend with, so data preparation should be minimal. Rename the twoRetrieve or Read CSV operators so you can tell the difference between yourtraining and scoring data sets.9) One necessary data preparation step is to add a Set Role operator and definethe Graduated attribute as your label in your training data. Alternatively, you canset your Graduated attribute as the label during data import.10) Add a Logistic Regression operator to your Training stream.11) Apply your Logistic Regression model to your scoring data and run yourmodel. Evaluate and report your results. Are your confidence percentagesinteresting? Surprising? Do the predicted Graduation values seem reasonableand consistent with your training data? Does any one independent variable(predictor attribute) seem to be a particularly good predictor of the dependentvariable (label or prediction attribute)? If so, why do you think so?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!