Question: For this chapter s exercise, you will compile your own data set based on people you know and the cars they drive, and then create
For this chapters exercise, you will compile your own data set based on people you know and the cars they drive, and then create a linear discriminant analysis, kNN and Nave Bayes of your data in order to predict categories for a scoring data set. Complete the following steps:
Open a new blank spreadsheet in OpenOffice Calc or another spreadsheet program of your choice. At the bottom of the spreadsheet there will be three default tabs labeled Sheet Sheet Sheet Rename the first one Training and the second one Scoring. You can rename the tabs by double clicking on their labels. You can delete or ignore the third default sheet.
On the training sheet, starting in cell A and going across, create attribute labels for six attributes: Age, Gender, MaritalStatus, Employment, Housing, and CarType.
Copy each of these attribute names except CarType into the Scoring sheet.
On the Training sheet, enter values for each of these attributes for several people that you know who have a car. These could be family members, friends and neighbors, coworkers or fellow students, etc. Try to do at least observations; or more would be better. Enter husband and wife couples as two separate observations, so long as each spouse has a different vehicle. Use the following to guide your data entry:
a For Age, you could put the persons actual age in years, or you could put them in buckets. For example, you could put for people aged ; for people aged ; etc.
b For Gender, enter for female and for male.
c For MaritalStatus, use for single, for married, for divorced, and for widowed.
d For Employment, enter for student, for fulltime, for parttime, and for retired.
e For Housing, use for lives rentfree with someone else, for rents housing, and for owns housing.
f For CarType, you can record data in a number of ways. This will be your label, or the attribute you wish to predict. You could record each persons car by make eg Toyota, Honda, Ford, etc. or you could record it by body style eg Car, Truck, SUV, etc. Be consistent in assigning classifications, and note that depending on the size of the data set you create, you wont want to have too many possible classifications, or your predictions in the scoring data set will be spread out too much. With small data sets containing only observations, the number of categories should be limited to three or four. You might even consider using Japanese, American, European as your CarTypes values.
Once youve compiled your Training data set, switch to the Scoring sheet in OpenOffice Calc. Repeat the data entry process for at least people more is better that you know who do not have a car. You will use the training set to try to predict the type of car each of these people would drive if they had one.
Use the File Save As menu option in OpenOffice Calc to save your Training and Scoring sheets as CSV files.
Either import your two CSV files into your RapidMiner respository, being sure to give them descriptive names, or read them into a new process using Read CSV
If you have prepared your data well in OpenOffice Calc, you shouldnt have any missing or inconsistent data to contend with, so data preparation should be minimal. Rename the two Retrieve operators or Read CSV operators so you can tell the difference between your training and scoring data sets.
One necessary data preparation step is to add a Set Role operator and define the CarType attribute as your label.
Add a Linear Discriminant Analysis operator to your Training stream.
Apply your LDA model to your scoring data and run your model. Evaluate and report your results. Did you get any confidence percentages? Do the predicted CarTypes seem reasonable and consistent with your training data? Why or why not?
Change your model operator to kNN then to Nave Bayes. Compare and contrast the results the outputs from the three modeling methodologies. Describe and discuss the differences.
Please turn in your training data, scoring data and answer document.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
