The goal of this project is to choose and evaluate clustering mechanisms. Use datasets from the UCI
Question:
The goal of this project is to choose and evaluate clustering mechanisms. Use datasets from the UCI Machine Learning Repository. If you wish to use other datasets in place of these, please give me a pointer to or description of the datasets and I'll let you know if that is okay (and which column it would count as). What you need to do for this project is:
1. Choose two datasets.
2. Determine how you will measure the quality of the clusters produced.
3. Choose two algorithms to compare.
4. Set up and run a comparison experiment, obtaining the quality measures you determined above.
You may find that some algorithms cannot be meaningfully applied to some datasets. If so, you can explain why in lieu of the experiment. However, saying the data has continuous values, the algorithm only applies to nominal values isn't good enough - you should instead discredited the continuous attributes. "Not applicable" is only valid if there is no reasonable way of preprocessing the data to make the algorithm apply. Each data set and algorithm you choose must be used at least once.
5. Explain which algorithm you would use for what types of data and why.
Project Report
The project report should contain the following:
1. Description of how you measured the cluster quality (this will include a brief overview of the datasets.)
2. Discussion of each of the four experiments, consisting of:
How you prepared the data
Parameters chosen for the algorithm
Experimental result summary, For each, you should include a brief discussion of why you made the decisions you did.
3. Conclusions: General discussion of the appropriate conditions for use of each algorithm. You may instead want to frame this as a discussion of appropriate type of algorithm for a general category of data (probably a more difficult task, but also more interesting.)
You should also include the output from your sample runs.