Question: 70-510: Introduction to Data Mining and Analytics Course Project In this project, you will apply the knowledge you learn throughout the course to data mine
70-510: Introduction to Data Mining and Analytics Course Project In this project, you will apply the knowledge you learn throughout the course to data mine a dataset of your choosing. You will start by researching a dataset to work with and then examine this data for what could be mined. After that, you will use data mining software to run algorithms and analyze the results. You will then present your work in a report. All these steps are outlined below Step 1- Pick a dataset You can pick any dataset you wish to work with, but here are a few links to available sources: http://archive.ics.uci.edu/ml/datasets.html http://www.kdnuggets.com/datasets/index.html http://data.un.org/ https://dreamtolearn.comode/2HUWRTIROI1VEB4B2FHE4ELN6 http://www2.census.gov/census 2000/datasets/ A couple of things to keep in mind when you pick a dataset: It should be free and available to use .Make sure it's in the format you can work with (csv, arff, xls, and tab files are usually ok) .You should pick data that is well documented so you know what it contains, where it came from, what it was originally used for, etc. After you pick a dataset, make sure to read its description and identify an objective- what type of information could you find from this data? Step 2-Prepare data Look through the dataset and its description to identify any potential problems, such as missing or corrupt values. Decide how you will handle this-perhaps you will need to delete some rows or columns or try to fill in or change the values. Typically data will be in a tabular form and you may be able to use MS Excel or similar software to view it. MS Excel can open csv, tab, and other file formats. For others, you can just use a text editor, such as NotePad++. Better yet, use one of the following data mining software packages: Orange (http://orange.biolab.si/ Weka Data Mining Toolkit (http://www.cs.waikato.ac.nz/ml/weka/) You may also have to convert the format to another one. This can be done either using MS Excel (for xsl to csv or tab conversions) or the data mining software (e.g. Weka)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
