Question: help with excel and oraange tools using these step by steps. post excel and orange tool screenshots! Step 1: Define/understand purpose predict or classify? Step
help with excel and oraange tools using these step by steps. post excel and orange tool screenshots!
Step 1: Define/understand purpose predict or classify? Step 2: Obtain data (may involve random sampling) Step 3: Explore, clean, pre-process data a. Check for missing values Coronary Heart Disease i. One way is to use Pivot Table highlight age Onset and Age Range => insert menu tab => PivotTable => ok => drag AgeRange to rows => Drag AgeOnset twice to Values (right click and put one as minimum and the other as maximum). Notice Blank category see data dictionary for where it belongs to. ii. Control A to highlight all data set => control F => find next b. Replace/impute or delete missing values (make sure to consult with data dictionary) =>Freeze top row c. Transform text values into numbers depending on the software you use, you will need to (i.e., MS Excel for Data Analysis) i. If statement is one option ii. VLookUp or HLookUp for replacing text with numbers 1. Lets use HLookUp for the easy one to save time such as high, medium, low. 2. Lets use VLookUp for present vs absent (find and replace is practical here); age level, and weight level iii. find and replace d. Preprocess data using Orange Data Mining Software i. Start the orange software => close the welcome window ii. Drag the file widget into the canvas => double click it and browse for the Heart Disease Excel file iii. Drag the data table widget and drop it in canvas => connect file to data table => double click data table to see content e. Step 2: run descriptive analytics i. Run correlation using MS Excel Data Analysis Add-in to reduce the number of variables that we want to use to predict/classify patients with Coronary Heart Disease ii. Create a link from file widget to a blank space and drop it and select scatter plot or distribution or any other descriptive analytics to explore the data iii. Remove all widgets except for file iv. Double-click on the file widget and double-click the word feature in the cross point of role column and coronary heart row => dropdown => select Target to make the target variable Step 4: Reduce the data; if supervised DM, partition it a. Insert Rank widget into canvas => connect file widget to it. Double-click rank and see ReliefF top 5 selection variables. If you decided to use only these top five, then you will need to go to File and skip the variables you do not want to include in the model. Dr. Murad Moqbel 1 Lecture Notes INFS 3390 Week 6 Data Analytics Step 5&6&7: Specify task (classification, clustering, etc.); Choose the techniques e.g., Tree or logistic regression for classification; Linear Regression and Tree for prediction; Iterative implementation and tuning a. Right-click in the canvas and type Tree => connect it to file b. Right-click in canvas and type Tree Viewer and connect to Tree c. Double click the tree viewer to see the tree => reduce the depth to 5 levels to see the whole tree d. Save the image so that you can see the full tree Step 5: Assess results compare models a. Delete all widgets except for the original File, Tree, and Logistic Regression b. Add the following widgets to the canvas: Random Forest, Neural Network, and kNN to test them as well. c. Right-click and search for Test & Score widget => Connect Tree and Logistic Regression, Random Forest, Neural Network, and kNN to it => double click Test & Score to see the scoring results. Cross validation matrix divides the dataset into 10 subsets, 9 subsets for training the model and one subset for testing the model. d. We look at the classification accuracy CA to compare the models performances. Step 7: Deploy best model Testing the model with real data a. Create a new file in canvas to read the new data to be predicted worksheet named ToBePredictedData b. Right-click and search for Predictions widget => connect it with the new file c. Connect the Tree widget with the Predictions widget => double click the predictions widget to see the prediction results d. Right-click and search for Logistic Regression widget => connect it with the original (training data) file => connect the Logistic Regression with Prediction => double-click predictions to compare results. The correct prediction is supposed to be 1=1, 2=0, 3=1, 4=0. Which model has better prediction results? Decision Tree or Logistic Regression? Exercise: Clean the childhood-Respiratory-Disease data set and run models to predict FEV (compare Orange with MS Excel linear regression model) and another model to classify Smoking. Test your models with the new data ToBePredicted_Test Data FEV and ToBePredicted_Test Data Smoker sheet.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
