Question: 1 . 1 . A . Preliminary data exploration, understanding and minor corrections in Excel file: The starting point for this subtask is the dataset

1.1.A. Preliminary data exploration, understanding and minor corrections in Excel file: The starting point for this subtask is the dataset itself, including the data dictionary and preparation tasks/notes in the second worksheet (Raw_data_dict). The next suggested step is to filter each column in the raw data to see whether there are blanks (missing values) or obvious data entry errors; the latter should be corrected in the Excel spreadsheet already before loading. The anomaly in the column Dependents should also be corrected in the Excel spreadsheet first, making reasonable assumptions. Finally, add and generate the variables Income_total and Loan_Income_ratio as explained in the data dictionary tab. 1.1.B. Data preparation for multiple analysis tasks: After the initial error removal in the source file (1.A.), you now have to perform the following data preparation steps in that order: Missing value handling/transformation; Data transformation for statistical analysis (e.g. string/cat to numeric, one to many); Outlier identification and treatment. For this subtask, you have essentially three options: 1) Use KNIME workflows for each analysis task or all tasks in one single workflow; 2) Use only EXCEL or any other tool and load the data for each analysis task; 3) Use a combination of the two options above (e.g. create a KNIME workflow to prepare the data and write or copy them to Excel files after outlier correction, and then use this for analysis tasks in Excel and KNIME; OR: treat missing variables and convert category variables already in Excel do outlier analysis only in KNIME.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!