Question: Q . 2 . 4 2 . 1 1 ( A + B ) 4 8 OVERVIEW OF THE DATA MINING PROCESS b . We
QAB OVERVIEW OF THE DATA MINING PROCESS b We plan to analyze the data using various data mining techniques described in future chapters. Prepare the dataset for data mining techniques of supervised learning by creating partitions using the JMP Pro Make Validation Column utility from the Cols menu Use the following partitioning percentages: training validation and test Describe the roles that these partitions will play in modeling. rn Consider the sample from a database of credit applicants in Table Comment on the likelihood that it was sampled randomly, and whether it is likely to be a useful sample. Consider the sample from a bank database shown in Table ; it was selected randomly from a larger database to be the training set. Personal Loan indicates whether a solicitation for a personal loan was accepted and is the response variable. A campaign is planned for a similar solicitation in the future and the bank is looking for a model that will identify likely responders. Examine the data carefully and indicate what your next step would be Using the concept of overfitting, explain why when a model is fit to training data, zero error with those data is not necessarily good. In fitting a model to classify prospects as purchasers or nonpurchasers, a certain company drew the training data from internal data that include demographic and purchase information. Future data to be classified will be lists purchased from other sources, with demographic but not purchase data included. It was found that "refund issued" was a useful predictor in the training data. Why is this not an appropriate variable to include in the model? A dataset has records and variables with of the values missing, spread randomly throughout the records and variables. An analyst decides to remove records that have missing values. About how many records would you expect would be removed? Normalize the data in Table showing calculations. Confirm your results in JMP create a JMP data table, then use the Formula Editor or the dynamic transformation feature Statistical distance between records can be measured in several ways. Consider Euclidean distance, measured as the square root of the sum of the squared differences. For the first two records in Table it is Can normalizing the data change which two records are farthest from each other in terms of Euclidean distance? Two models are applied to a dataset that has been partitioned. Model A is considerably more accurate than model B on the training data, but slightly less accurate than model B on the validation data. Which model are you more likely to consider for final deployment? The dataset ToyotaCorolla.j mp contains data on used cars on sale during the late summer of in the Netherlands. It has records containing details on attributes, including Price, Age, Kilometers, HP and other specifications. a Explore the data using the data visualization eg Graph Scatterplot Matrix and Graph Graph Builder capabilities of JMP Which of the pairs among the variables seem to be correlated? Refer to the guides and videos at jmpcomlearn under Graphical Displays and Summaries, for basic information on how to use these platforms.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
