An automotive insurance company wants to predict which filed stolen vehicle claims are fraudulent, based on the

Question:

An automotive insurance company wants to predict which filed stolen vehicle claims are fraudulent, based on the number of claims submitted per year by the policy holder and whether the policy is a new policy, that is, is one year old or less (coded as 1 = yes, 0 = no). Data from a random sample of 98 automotive insurance claims are organized and stored in InsuranceFraud . (Data extracted from Gelp et al., "A Comparative Analysis of Decision Trees vis-à-vis Other Computational Data Mining Techniques in Automotive Insurance Fraud Detection," Journal of Data Science, 10 (2012), pp. 537-561.)

a. Using all the data as the training sample, develop a classification tree model to predict the probability of a fraudulent claim, based on the number of claims submitted per year by the policy holder and whether the policy is new.

b. What conclusions can you reach about the probability of a fraudulent claim?

c. Using half the data as the training sample and the other half of the data as the validation sample, develop a classification tree model to predict the probability of a fraudulent claim, based on the number of claims submitted per year by the policy holder and whether the policy is new.

d. What differences exist in the results of (a) and (c)? What conclusions can you reach about the models fit from the training samples in (a) and (c)?