Question: The RIPPER algorithm (by Cohen [1]) is an extension of an earlier algorithm called IREP (by F¨urnkranz and Widmer [3]). Both algorithms apply the reduced-error

The RIPPER algorithm (by Cohen [1]) is an extension of an earlier algorithm called IREP (by F¨urnkranz and Widmer [3]). Both algorithms apply the reduced-error pruning method to determine whether a rule needs to be pruned. The reduced error pruning method uses a validation set to estimate the generalization error of a classifier. Consider the following pair of rules:
R1: A ˆ’ †’ C
R2: A ˆ§ B ˆ’ †’ C
R2 is obtained by adding a new conjunct, B, to the left-hand side of R1. For this question, you will be asked to determine whether R2 is preferred over R1 from the perspectives of rule-growing and rule-pruning. To determine whether a rule should be pruned, IREP computes the following measure:

(a) Suppose R1 is covered by 350 positive examples and 150 negative examples, while R2 is covered by 300 positive examples and 50 negative examples. Compute the FOIL's information gain for the rule R2 with respect to R1.
(b) Consider a validation set that contains 500 positive examples and 500 negative examples. For R1, suppose the number of positive examples covered by the rule is 200, and the number of negative examples covered by the rule is 50. For R2, suppose the number of positive examples covered by the rule is 100 and the number of negative examples is 5. Compute vIREP for both rules. Which rule does IREP prefer?
(c) Compute vRIPPER for the previous problem. Which rule does RIPPER prefer?

p+(N n) VIREP P+N

Step by Step Solution

★★★★★

3.48 Rating (165 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

a For this problem p 0 350 n 0 150 p 1 300 and n 1 50 Therefore the FOIL... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Document Format (1 attachment)

908-M-S-D-A (8627).docx

120 KBs Word File

Students Have Also Explored These Related Statistics Questions!

Describe how the one-factor repeated-measures ANOVA is an extension of the dependent t test.

Within-subjects, one-way ANOVA is an extension of the paired-samples t test to situationsin which there are more than ____ dependent samples.

This problem will introduce the learner into a technique called Analysis of Variance. For this course we will only conduct a simple One-Way ANOVA and touch briefly on the important elements of this...

Q . 1 1 Pruning in RBC m is an extension of an earlier algorithm called IREP. Both algorithms apply the reduced - error pruning method to determin classifier. Consider the following pair of rules: R...

Given dataset above, the RIPPER algorithm generated the first rule R 1 : ( Live in Water = sometimes ) - > Amphibians Evaluate the rule in terms of coverage, accuracy, and Foil's Informat

Demand and Supply Discussion Question: Applications-Ubereconomics Please read the article below and respond to the follow-up question: Using Big Data to Estimate Consumer Surplus: The Case of Uber...

Demand and Supply Discussion Question: Applications-Ubereconomics Please read the article below and respond to the follow-up question: Why Uber Is an Economist's Dream Does 'surge pricing'...

Analyze the Design and Methodology in Two Quantitative Studies Recall the two quantitative studies you read for this lesson from the eReserves; use these to answer the following questions: Identify...

Solve the following Questions and submit to the blackboard. (5 points) The following cipher text is encrypted using a 200 bit RSA key whose modulus and public exponent are given below. Crack the...

1. (20 points) Download the Wireshark trace file from Misc Files area on D2L. The trace file contains the data while I visited a web site. Answer the following questions: a. What is the web site URL?...

If the theoretical yield for a reaction was 131 grams and I actually made 112 grams of the product, what is my percent yield? O a. 117% O b. 85.5% O c. 14.5% O d. 0.855%

The question arises every year when the playoffs begin in the National Football League: Which division teams are the toughest, East, North, South, or West?Two ways to measure the strength of the...

Aequired informsetien ( The following informationt ayparss to ine questions dayibyd beflow ) Mead Incorporatedf began opecations in Year 1 . Following is a series of transwcions and everits involving...

7. Cell phone costs Noting a recent study predicting the increase in cell phone costs, a friend remarks that by the time hes a grandfather, no one will be able to afford a cell phone. Explain where...

In Chapter 6, we will learn to split the data set into a training data set and a test data set. To test whether there exist unwanted differences between the training and test set, which hypothesis...

Table 5.10 contains information on the mean duration of customer service calls between a training and a test data set. Test whether the partition is valid for this variable, using = 0.10. Data set...

In a sample of 100 customers, 240 churned when the company raised rates. Test whether the population proportion of churners is less than 25%, using level of significance = 0.01.

Yesterday, September 22, 2009, Wireless Logic Corp. (WLC) paid its annual dividend of $1.25 per share. Because WLCs financial prospects are particularly bright, investors believe that the company...

1) Renfro Rentals has issued bonds that have a 9% coupon rate, payable semiannually. The bonds mature in 16 years, have a face value of $1,000, and a yield to maturity of 8.5%. What is the price of...

background information: clifford clark is a recent retiree who is interested in investing some of his savings in corporate bonds. His financial planner has suggested the following bonds: *bond a - 7%...