Question: help with excel and oraange tools using these step by steps. post excel and orange tool screenshots! Step 1: Define/understand purpose predict or classify? Step

help with excel and oraange tools using these step by steps. post excel and orange tool screenshots!

Step 1: Define/understand purpose predict or classify? Step 2: Obtain data (may involve random sampling) Step 3: Explore, clean, pre-process data a. Check for missing values Coronary Heart Disease i. One way is to use Pivot Table highlight age Onset and Age Range => insert menu tab => PivotTable => ok => drag AgeRange to rows => Drag AgeOnset twice to Values (right click and put one as minimum and the other as maximum). Notice Blank category see data dictionary for where it belongs to. ii. Control A to highlight all data set => control F => find next b. Replace/impute or delete missing values (make sure to consult with data dictionary) =>Freeze top row c. Transform text values into numbers depending on the software you use, you will need to (i.e., MS Excel for Data Analysis) i. If statement is one option ii. VLookUp or HLookUp for replacing text with numbers 1. Lets use HLookUp for the easy one to save time such as high, medium, low. 2. Lets use VLookUp for present vs absent (find and replace is practical here); age level, and weight level iii. find and replace d. Preprocess data using Orange Data Mining Software i. Start the orange software => close the welcome window ii. Drag the file widget into the canvas => double click it and browse for the Heart Disease Excel file iii. Drag the data table widget and drop it in canvas => connect file to data table => double click data table to see content e. Step 2: run descriptive analytics i. Run correlation using MS Excel Data Analysis Add-in to reduce the number of variables that we want to use to predict/classify patients with Coronary Heart Disease ii. Create a link from file widget to a blank space and drop it and select scatter plot or distribution or any other descriptive analytics to explore the data iii. Remove all widgets except for file iv. Double-click on the file widget and double-click the word feature in the cross point of role column and coronary heart row => dropdown => select Target to make the target variable Step 4: Reduce the data; if supervised DM, partition it a. Insert Rank widget into canvas => connect file widget to it. Double-click rank and see ReliefF top 5 selection variables. If you decided to use only these top five, then you will need to go to File and skip the variables you do not want to include in the model. Dr. Murad Moqbel 1 Lecture Notes INFS 3390 Week 6 Data Analytics Step 5&6&7: Specify task (classification, clustering, etc.); Choose the techniques e.g., Tree or logistic regression for classification; Linear Regression and Tree for prediction; Iterative implementation and tuning a. Right-click in the canvas and type Tree => connect it to file b. Right-click in canvas and type Tree Viewer and connect to Tree c. Double click the tree viewer to see the tree => reduce the depth to 5 levels to see the whole tree d. Save the image so that you can see the full tree Step 5: Assess results compare models a. Delete all widgets except for the original File, Tree, and Logistic Regression b. Add the following widgets to the canvas: Random Forest, Neural Network, and kNN to test them as well. c. Right-click and search for Test & Score widget => Connect Tree and Logistic Regression, Random Forest, Neural Network, and kNN to it => double click Test & Score to see the scoring results. Cross validation matrix divides the dataset into 10 subsets, 9 subsets for training the model and one subset for testing the model. d. We look at the classification accuracy CA to compare the models performances. Step 7: Deploy best model Testing the model with real data a. Create a new file in canvas to read the new data to be predicted worksheet named ToBePredictedData b. Right-click and search for Predictions widget => connect it with the new file c. Connect the Tree widget with the Predictions widget => double click the predictions widget to see the prediction results d. Right-click and search for Logistic Regression widget => connect it with the original (training data) file => connect the Logistic Regression with Prediction => double-click predictions to compare results. The correct prediction is supposed to be 1=1, 2=0, 3=1, 4=0. Which model has better prediction results? Decision Tree or Logistic Regression? Exercise: Clean the childhood-Respiratory-Disease data set and run models to predict FEV (compare Orange with MS Excel linear regression model) and another model to classify Smoking. Test your models with the new data ToBePredicted_Test Data FEV and ToBePredicted_Test Data Smoker sheet.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

Step 1: Define/understand purpose - predict or classify? Step 2: Obtain data (may involve random sampling) Step 3: Explore, clean, pre-process data a. Check for missing values - Coronary Heart...

Q.2 Even though the revenue recognition steps we covered in class and in the textbook were integrated in U.S. GAAP in 2016, imagine that these rules were in place in 2011. Based solely on the...

African Journal of Procurement, Logistics & Supply Chain Management (https://damaacademia.com/ajplscm/) Volume 1, Issue 3, pp.36-49, March 2019 Published by: Dama Academic Scholarly & Scientific...

This case is adapted from Taxes for the Masses by Bridget Stomberg and Lisa De Simone. Before beginning the data analytics portion of this assignment, please listen to the May 1, 2022 episode of...

Lecture Notes DL MGT 5100 - Distribution Management Spring 2017 1.0. Day one, Monday, Monday, 9 Jan 17 1.1. Reading Assignments: Chapters 1 and 2 1.1.1. I intend to follow the book so as to provide a...

The best definition for nonparametric statistics I can think of is when data does not fit inside the normal parameters of a normal distribution, it calls for the use of this statistical method....

1. What are the various forms of employee behaviour that can be observed in organizations, and what is the impact of the various forms on organizations? Identify the forms of employee behaviour that...

Let A be an mxn matrix with the property that whenever the entries of a row are summed, the result is always equal to s, and similarly, whenever the entries of a column are summed, the results is...

[ Original Text ] New vendors must be approved by the controller and CFO ( Treasurer ) . A copy of the approved purchase order is sent to the vendor for fulfillment. Bids must be obtained from a...

Premise: Using a dictionary and list, create a set of key / value pairs that consists of a musical database. The key / value pairs should consist of Artist Name: Song Names pattern. For example, EACH...