Question: Need highlighted part. example code given: status = read.csv(file = ...) entropy = function(x){ # x is a sequence of frequencies, for example, x =

Need highlighted part. example code given: status = read.csv(file = "...") entropy Need highlighted part.

example code given:

status = read.csv(file = "...") entropy = function(x){ # x is a sequence of frequencies, for example, x = c(5,9) x.probs = ... x.info = log(x.probs, base = 2) return(x.info) }

# split.Department = aggregate(Count, by = status[c('Department','Status')])

Info = function(count_col, by_col, target_col, data){ # count_col is the column name for the frequency # by_col is the column name defining the group # target_col is the column name defining the target # data is a data.frame # for example, data = play, count_col = "count", by_col = "Outlook", target_col = "PlayTennis" Split = ... n = ... p = ... info = ... return(info) }

Info.gain = function(count_col, by_col, target_col, data){ info.nosplit = Info(count_col, NULL, target_col, data) info.split = Info(count_col, by_col, target_col, data) gain = info.nosplit - info.split return(gain) }

The following table consists of training data from an employee database. The data have been generalized. For example, "31 35" for age represents the age category with range of 31 to 35. For a given row entry, count represents the number of data tuples having the values for department, status, age, and salary given in that row department status age sales sales sales svstems svstems svstems svstems marketing senior salary count senior 31...35 46K. .. 50K 30 junior 26...30 26K...30K 40 Junior 31 35 31K 35K 40 junior 21... 25 46K...50K 20 senior 31.. .35 66K...70K 5 junior 26...30 46K...50K 3 senior 41... 45 66K. .. 70K 3 46K. .. 50K 36. ..40 ting junior 31...35 41K... 45K 4 10 senior 46... 50 36K... 40K 4 junior 26...30 26K... 30K 6 secretarv secretar,y Let status be the class label attribute. (a) Construct a decision tree from the given data using information gain. Use R to verify your result and show your code. The following table consists of training data from an employee database. The data have been generalized. For example, "31 35" for age represents the age category with range of 31 to 35. For a given row entry, count represents the number of data tuples having the values for department, status, age, and salary given in that row department status age sales sales sales svstems svstems svstems svstems marketing senior salary count senior 31...35 46K. .. 50K 30 junior 26...30 26K...30K 40 Junior 31 35 31K 35K 40 junior 21... 25 46K...50K 20 senior 31.. .35 66K...70K 5 junior 26...30 46K...50K 3 senior 41... 45 66K. .. 70K 3 46K. .. 50K 36. ..40 ting junior 31...35 41K... 45K 4 10 senior 46... 50 36K... 40K 4 junior 26...30 26K... 30K 6 secretarv secretar,y Let status be the class label attribute. (a) Construct a decision tree from the given data using information gain. Use R to verify your result and show your code

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

answer the question clearly You are building a flight-control system for which a convincing safety case must be made. Would you assign the tasks of safety requirements engineering, test case...

You are designing a new syntax for a programming language like Java, with the intention of making it more approachable to students by using English words instead of punctuation symbols. (a) How does...

What is the difference between MouseListener and MouseAdapter? [3 marks] (b) Via suitable HTML, the compiled version of the following Java code is presented to the appletviewer application: import...

) Explain the term overloading in the context of Java constructors and methods. [2 marks] (b) Without describing the details of either, outline the relationship between the Java methods...

Portray in words what transforms you would have to make to your execution to some degree (a) to accomplish this and remark on the benefits and detriments of this thought.You are approached to compose...

Prolog You are approached to compose a Prolog program to work with twofold trees. Your code shouldn't depend on any library predicates and you ought to expect that the mediator is running without...

ttth Suppose that the sequence of bags {Bn | n N} is recursively enumerated by the computable function e(n, x) = fn(x), [7 marks] Hence prove that the set of all recursive bags cannot be recursively...

A creative engineer suggests structuring the TLB so that not all the bits of the presented address need match to result in a hit. Suggest how this might be achieved, and what might be the costs and...

Briefly describe ASCII and Unicode and draw attention to any relationship between them. [3 marks] (b) Briefly explain what a Reader is in the context of reading characters from data. [3 marks] A...

Give the typing rules for Peano natural numbers and their eliminator.(ii) Using the rules given above, define the addition function.] (iii) Let a binary tree be either a leaf Leaf or a node...

1) Is it fair for a court to hold that parties are bound in contract even though one of the parties later claims that he/she did not intend to form a contract? (2) Generally, should the courts give...

Plants absorb {fill_regular} from the atmosphere through little pores called {fill_regular} that are present on the surface of the leaves and are surrounded by {fill_regular}.

As assets are used, which of the following is recognized? a . Liability b . Expense c . Revenue d . Book value

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

What is the default Aggregation Method in SQL Server Analysis Services in Cube Processing? What are the other options?

What is the default Aggregation Method in SQL Server Analysis Services in Cube Processing? What are the other standard optional methods?

Before starting an SQL Server Analysis Services Multidimensional Modeling Project, why is identification of a Data Source important?