For this chapter s exercise, you will compile your own data set based on people you know and the cars they drive, and then create a linear discriminant analysis, k NN , and Na ve Bayes of your data in order to predict categories for a scoring data set Complete the following steps 1 ) Open a new blank spreadsheet in OpenOffice Calc or another spreadsheet program of your choice At the bottom of the spreadsheet there will be three default tabs labeled Sheet 1 , Sheet 2 , Sheet 3 Rename the first one Training and the second one Scoring You can rename the tabs by double clicking on their labels You can delete or ignore the third default sheet 2 ) On the training sheet, starting in cell A 1 and going across, create attribute labels for six attributes Age, Gender, Marital Status, Employment, Housing, and Car Type 3 ) Copy each of these attribute names except Car Type into the Scoring sheet 4 ) On the Training sheet, enter values for each of these attributes for several people that you know who have a car These could be family members, friends and neighbors, coworkers or fellow students, etc Try to do at least 2 0 observations 3 0 or more would be better Enter husband and wife couples as two separate observations, so long as each spouse has a different vehicle Use the following to guide your data entry a For Age, you could put the person s actual age in years, or you could put them in buckets For example, you could put 1 0 for people aged 1 0 1 9 2 0 for people aged 2 0 2 9 etc b For Gender, enter 0 for female and 1 for male c For Marital Status, use 0 for single, 1 for married, 2 for divorced, and 3 for widowed d For Employment, enter 0 for student, 1 for full time, 2 for part time, and 3 for retired e For Housing, use 0 for lives rent free with someone else, 1 for rents housing, and 2 for owns housing f For Car Type, you can record data in a number of ways This will be your label, or the attribute you wish to predict You could record each person s car by make ( e g Toyota, Honda, Ford, etc ) , or you could record it by body style ( e g Car, Truck, SUV, etc ) Be consistent in assigning classifications, and note that depending on the size of the data set you create, you won t want to have too many possible classifications, or your predictions in the scoring data set will be spread out too much With small data sets containing only 2 0 3 0 observations, the number of categories should be limited to three or four You might even consider using Japanese, American, European as your Car Types values 5 ) Once you ve compiled your Training data set, switch to the Scoring sheet in OpenOffice Calc Repeat the data entry process for at least 2 0 people ( more is better ) that you know who do not have a car You will use the training set to try to predict the type of car each of these people would drive if they had one 6 ) Use the File Save As menu option in OpenOffice Calc to save your Training and Scoring sheets as CSV files 7 ) Either import your two CSV files into your RapidMiner respository, being sure to give them descriptive names, or read them into a new process using Read CSV 8 ) If you have prepared your data well in OpenOffice Calc, you shouldn t have any missing or inconsistent data to contend with, so data preparation should be minimal Rename the two Retrieve operators ( or Read CSV operators ) so you can tell the difference between your training and scoring data sets 9 ) One necessary data preparation step is to add a Set Role operator and define the Car Type attribute as your label 1 0 ) Add a Linear Discriminant Analysis operator to your Training stream 1 1 ) Apply your LDA model to your scoring data and run your model Evaluate and report your results Did you get any confidence percentages Do the predicted Car Types seem reasonable and consistent with your training data Why or why not 1 2 ) Change your model operator to k NN , then to Na ve Bayes Compare and contrast the results the outputs from the three modeling methodologies Describe and discuss the differences Please turn in your training data, scoring data and answer document

The Answer is in the image, click to view ...

Question: For this chapter s exercise, you will compile your own data set based on people you know and the cars they drive, and then create

For this chapter

s exercise, you will compile your own data set based on people you know and the cars they drive, and then create a linear discriminant analysis, k

-

,

and Na

ve Bayes of your data in order to predict categories for a scoring data set. Complete the following steps:

1)

Open a new blank spreadsheet in OpenOffice Calc or another spreadsheet program of your choice. At the bottom of the spreadsheet there will be three default tabs labeled Sheet

1,

Sheet

2,

Sheet

3 .

Rename the first one Training and the second one Scoring. You can rename the tabs by double clicking on their labels. You can delete or ignore the third default sheet.

2)

On the training sheet, starting in cell A

1

and going across, create attribute labels for six attributes: Age, Gender, Marital

_

Status, Employment, Housing, and Car

_

Type.

3)

Copy each of these attribute names except Car

_

Type into the Scoring sheet.

4)

On the Training sheet, enter values for each of these attributes for several people that you know who have a car. These could be family members, friends and neighbors, coworkers or fellow students, etc. Try to do at least

20

observations;

30

or more would be better. Enter husband and wife couples as two separate observations, so long as each spouse has a different vehicle. Use the following to guide your data entry:

.

For Age, you could put the person

s actual age in years, or you could put them in buckets. For example, you could put

10

for people aged

10 - 19

;

20

for people aged

20 - 29

; etc.

.

For Gender, enter

0

for female and

1

for male.

.

For Marital

_

Status, use

0

for single,

1

for married,

2

for divorced, and

3

for widowed.

.

For Employment, enter

0

for student,

1

for full

-

time,

2

for part

-

time, and

3

for retired.

.

For Housing, use

0

for lives rent

-

free with someone else,

1

for rents housing, and

2

for owns housing.

.

For Car

_

Type, you can record data in a number of ways. This will be your label, or the attribute you wish to predict. You could record each person

s car by make

(

.

.

Toyota, Honda, Ford, etc.

),

or you could record it by body style

(

.

.

Car, Truck, SUV, etc.

) .

Be consistent in assigning classifications, and note that depending on the size of the data set you create, you won

t want to have too many possible classifications, or your predictions in the scoring data set will be spread out too much. With small data sets containing only

20 - 30

observations, the number of categories should be limited to three or four. You might even consider using Japanese, American, European as your Car

_

Types values.

5)

Once you

ve compiled your Training data set, switch to the Scoring sheet in OpenOffice Calc. Repeat the data entry process for at least

20

people

(

more is better

)

that you know who do not have a car. You will use the training set to try to predict the type of car each of these people would drive if they had one.

6)

Use the File

>

Save As menu option in OpenOffice Calc to save your Training and Scoring sheets as CSV files.

7)

Either import your two CSV files into your RapidMiner respository, being sure to give them descriptive names, or read them into a new process using Read CSV

.

8)

If you have prepared your data well in OpenOffice Calc, you shouldn

t have any missing or inconsistent data to contend with, so data preparation should be minimal. Rename the two Retrieve operators

(

or Read CSV operators

)

so you can tell the difference between your training and scoring data sets.

9)

One necessary data preparation step is to add a Set Role operator and define the Car

_

Type attribute as your label.

10)

Add a Linear Discriminant Analysis operator to your Training stream.

11)

Apply your LDA model to your scoring data and run your model. Evaluate and report your results. Did you get any confidence percentages? Do the predicted Car

_

Types seem reasonable and consistent with your training data? Why or why not?

12)

Change your model operator to k

-

,

then to Na

ve Bayes. Compare and contrast the results the outputs from the three modeling methodologies. Describe and discuss the differences.

Please turn in your training data, scoring data and answer document.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Part 2: Auto dataset revisited We also used the auto dataset two weeks ago in lab 6. We used it with LDA and QDA. Both methods in R provide a CV argument that will compute a LOOCV estimate for us. If...

Listen to the following videos and then complete the assignments using the changed numbers on the guidance report. Place your answers on the guidance report. Open the Guidance Report and rework the...

Visualization in python import matplotlib.pyplot as plt import numpy as np import seaborn as sns x = np.linspace(-np.pi, np.pi, 256, endpoint=True) #Return evenly spaced numbers over a specified...

import matplotlib.pyplot as plt import numpy as np import seaborn as sns x = np.linspace(-np.pi, np.pi, 256, endpoint=True) #Return evenly spaced numbers over a specified interval. y = np.cos(x) y1 =...

help me kindly EXERCISES FOR CHAPTER 1 Exercise 1.1 An economy has 100 identical workers. Each one can produce four cakes or three shirts, regardless of the number of other individuals producing each...

Please answer all Question2 section. I already have question1 section answered. In [ ]: import matplotlib.pyplot as plt import numpy as np import seaborn as sns Problem 1 Data is provided as follows:...

Exercise 1.3 Using the PPF that you have graphed using the data in Exercise 1.2, determine if the following combinations are attainable or not: (X = 3000, Y = 720), (X = 4800, Y = 480). Exercise 1.4...

Managing Scope Changes Case Study Scope changes on a project can occur regardless of how well the project is planned or executed. Scope changes can be the result of something that was omitted during...

Board CHAPTER 1 Economics: Foundations and Models n this book, we use economics to answer questions such as the following What determines the prices of goods and services from bottled water to smart...

by Vicki Jayne / business excellence Built to Last How to fill performance gaps How do you build a strong performance ethic into your organisational culture? Vicki Jayne talks to two executives who...

Read the article and answer the question: Under what circumstances would it make sense to take an existing brand name online? When would it not make sense? The Google Story What performs over a...

How did he claim Philip Morris committed fraud?

A split gift transfer: involves two public ( charitable ) organizations, involves two present interests, qualifies for charitable contribution deduction if the recipient charity receives a guaranteed...

What is a marketing plan strategy that you feel adapted and did exceedingly well throughout the pandemic