Question: You are expected to identify any data repository and extract one secondary dataset. You are to provide a step-by-step procedure on how to pre-process the

You are expected to identify any data repository and extract one secondary

dataset. You are to provide a step-by-step procedure on how to pre-process

the extracted dataset and use the procedure to preprocess the extracted

data.

a) What is data?

[1 mark]

b) What is the difference between primary data and secondary data?

[2 marks]

c) What is the name of the data repository you identified? Provide the

repositorys URL

[2 marks]

d) Write the step-by-step procedure you will consider for preprocessing

the extracted dataset.

[5 marks]

e) Implement (d) on the extracted dataset. Upload both the original

dataset and the preprocessed dataset.

[5 marks]

f) Write the three phases for preparing the data in a text file to be called

in WEKA.

[1.5 marks]

g) What is the file extension for data files to be called in MATLAB, R

Software, WEKA, SPSS and RapidMiner?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!