Question: Q . 2 . 4 2 . 1 1 ( A + B ) 4 8 OVERVIEW OF THE DATA MINING PROCESS b . We

. 2.4 2.11 (

+

) 48

OVERVIEW OF THE DATA MINING PROCESS b

.

We plan to analyze the data using various data mining techniques described in future chapters. Prepare the dataset for data mining techniques of supervised learning by creating partitions using the JMP Pro Make Validation Column utility

(

from the Cols menu

) .

Use the following partitioning percentages: training

(50 %),

validation

(30 %),

and test

(20 %) .

Describe the roles that these partitions will play in modeling.

/

/

2.3

Consider the sample from a database of credit applicants in Table

2.5 .

Comment on the likelihood that it was sampled randomly, and whether it is likely to be a useful sample.

2.4

Consider the sample from a bank database shown in Table

2.6

; it was selected randomly from a larger database to be the training set. Personal Loan indicates whether a solicitation for a personal loan was accepted and is the response variable. A campaign is planned for a similar solicitation in the future and the bank is looking for a model that will identify likely responders. Examine the data carefully and indicate what your next step would be

. 2.5

Using the concept of overfitting, explain why when a model is fit to training data, zero error with those data is not necessarily good.

2.6

In fitting a model to classify prospects as purchasers or nonpurchasers, a certain company drew the training data from internal data that include demographic and purchase information. Future data to be classified will be lists purchased from other sources, with demographic

(

but not purchase

)

data included. It was found that "refund issued" was a useful predictor in the training data. Why is this not an appropriate variable to include in the model?

2.7

A dataset has

1000

records and

50

variables with

5 %

of the values missing, spread randomly throughout the records and variables. An analyst decides to remove records that have missing values. About how many records would you expect would be removed?

2.8

Normalize the data in Table

2.7,

showing calculations. Confirm your results in JMP

(

create a JMP data table, then use the Formula Editor or the dynamic transformation feature

) . 2.9

Statistical distance between records can be measured in several ways. Consider Euclidean distance, measured as the square root of the sum of the squared differences. For the first two records in Table

2.7,

it is

((25 - 56)^2 + (49, 000 - 156, 000)^2)

Can normalizing the data change which two records are farthest from each other in terms of Euclidean distance?

2.10

Two models are applied to a dataset that has been partitioned. Model A is considerably more accurate than model B on the training data, but slightly less accurate than model B on the validation data. Which model are you more likely to consider for final deployment?

2.11

The dataset ToyotaCorolla.j mp contains data on used cars on sale during the late summer of

2004

in the Netherlands. It has

1436

records containing details on

38

attributes, including Price, Age, Kilometers, HP

,

and other specifications. a

.

Explore the data using the data visualization

(

.

.,

Graph

>

Scatterplot Matrix and Graph

>

Graph Builder

)

capabilities of JMP

.

Which of the pairs among the variables seem to be correlated?

(

Refer to the guides and videos at jmp

.

com

/

learn

,

under Graphical Displays and Summaries, for basic information on how to use these platforms.

)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

Linda Construction LLC is a general contracting company. The company performs contracted work mainly in the area of engineering and heavy construction. The company executes multiple projects...

Project Case Background: Simplex Construction LLC is a general contracting company. The company performs contracted work mainly in the area of engineering and heavy construction. The company executes...

Linda Construction LLC is a general contracting company. The company performs contracted work mainly in the area of engineering and heavy construction. The company executes multiple projects...

i also need help with example 17-11 and 17-12 EX 17-11 Equivalent units of production and related costs A. 1,000 units Obj. 2 The charges to Work in Process-Assembly Department for a period, together...

116 Chapter 3 Process Cost Systems EX 3-11 Equivalent units of production and related costs 0BJ.2 he charges to Work in Process-Assembly Department for a period, together with infor- on concerning...

1. What is your promotion mix for your target market? 2. How does the above relate to the target market?s buying behaviors? 3. Based on the above product strategy how will you price the product? What...

You are working as the controller of a small clothing manufacturing business in the Midwest, DownHome Outerwear, Inc. DownHome uses a FIFO inventory valuation system. On Dec 31, 2021, ending...

BFIN 6 0 9 0 RESEARCH ASSIGNMENT# 1 Research 1 0 articles about any topics dealing with Investing. o 1 2 Font, double - spaced. o Two paragraphs in length. o Summarize the article and give your...

Now, Kim and Edward are getting ready to prepare the balance sheet for JDK for the third quarter of the year. Because this business has two components, there will be two balance sheets. They start...