Question: Consider the seeds data set, the examined data group comprised kernels belonging to three different varieties of wheat: Kama, Rosa and Canadian, 70 elements each,

Consider the seeds data set, the examined data group comprised kernels belonging to three different varieties of wheat: Kama, Rosa and Canadian, 70 elements each, randomly selected for the experiment. High quality visualization of the internal kernel structure was detected using a soft X-ray technique. It is non-destructive and considerably cheaper than other more sophisticated imaging techniques like scanning microscopy or laser technology. The images were recorded on 13x18 cm X-ray KODAK plates. Studies were conducted using combine harvested wheat grain originating from experimental fields, explored at the Institute of Agrophysics of the Polish Academy of Sciences in Lublin.

To construct the data, seven geometric parameters of wheat kernels were measured:

area A,

perimeter P,

compactness C = 4*pi*A/P^2,

length of kernel,

width of kernel,

asymmetry coefficient

Length of kernel groove.

All of these parameters were real-valued continuous. The last attribute in the data file represents the class label.

Partition the data into training and testing (75%, 25%) and create a decision tree classification model for the three different varieties of wheat: Kama, Rosa and Canadian. Report the accuracy and misclassification matrix for both training and testing. Show a graphical representation of the decision tree model that you created. What appears to be the most important attributes for classifying the wheat data?

On the same data, apply a K-nearest neighbor classifier to classify the data. Normalize the data. Partition the data into training and testing (75%, 25%) and perform a K-nearest neighbor classification. Report the accuracy and misclassification matrix for values of k equal to 1, 3, and 5. After some exploration which k offers the best results?

Create a Nave Bayes classifier and report the accuracy results on both training (75%) and testing (25%) data and the misclassification matrix

Compare the results of the three models and write a brief paragraph proposing one of the models for adoption of predicting the seed class.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!