Question: PLEASE USE PYTHON 3 AND DO NOT USE ANY INBUILT LIBRARIES THAT GIVE YOU A DIRECT ANSWER THANK YOU Theory We discussed in class the

PLEASE USE PYTHON 3 AND DO NOT USE ANY INBUILT LIBRARIES THAT GIVE YOU A DIRECT ANSWER THANK YOU Theory We discussed in class the classic ID-3 approach for inferring decision trees. The algorithm keeps

PLEASE USE PYTHON 3

AND DO NOT USE ANY INBUILT LIBRARIES THAT GIVE YOU A DIRECT ANSWER

THANK YOU

Theory We discussed in class the classic ID-3 approach for inferring decision trees. The algorithm keeps splitting nodes as long as the nodes have nonzero entropy and features/attributes are available. This project's goal is to infer imperfect decision trees with a small number of nodes and with small entropy. Consider a tree that partitions the instances into k leaves: S={s1,s2,,sk}. The target attribute is T, and the purpose of the tree is to enable us to infer the target attribute. As in the case of ID-3 our goal is to minimize: Entropy(TS)=j=1kProb(sj)Entropy(Tsj) A greedy approach for inferring such a tree is to iteratively determine what node should be partitioned further. The contribution of a single leaf sj to the total uncertainty is: Ej=Prob(sj)Entropy(Tsj)=nsjEntropy(Tsj) where sj is the number of instances in sj and n is the total number of instances. Therefore, it makes sense to choose sj with a maximal Ej. A better criterion is obtained using lookahead. Let A1,Am be the attributes. Let Tj be the target attribute restricted to the subset sj. If we choose sj, it is to be partitioned using the attribute Ai that minimizes Entropy (TjAi) computed as: Entropy(TjAi)=lsjalEntropy(Tjal) The total change in the entropy of the tree is: Eji=Prob(sj)(Entropy(TjAi)Entropy(Tsj))=Prob(sj)Gain(sj,Ai)where:Gain(sj,Ai)=Entropy(sj)Entropy(sjAi) Therefore, we should choose sj that minimizes Eji, or, equivalently, the one that maximizes F=maxi,jnsjGain(sj,Ai) Example Starting with the single leaf {1,2,3,4,5,6,7,8,9,10} it must be chosen as the first leaf to split. Entropy(T)Entropy(TA1)Entropy(TA2)Entropy(TA3)=0.970951=0.924511=0.965148=0.95098Gain(T,A1)=0.0464393Gain(T,A2)=0.00580215Gain(T,A3)=0.0199731 Therefore, the best choice is A1. This produces the following partition into three leaves: s1={1,2,3,4},s2={5,6,7,8}s3={9,10} we have: Entropy(s1)=0.811278Entropy(s1A1)=0.811278Gain(s1,A1)=0Entropy(s1A2)=0.688722Gain(s1,A2)=0.122556Entropy(s1A3)=0.688722Gain(s1,A3)=0.122556F1=0.40.122556=0.0490224Entropy(s2)=1Entropy(s2A1)=1Gain(s2,A1)=0Entropy(s2A2)=1Gain(s2,A2)=0Entropy(s2A3)=1Gain(s2,A3)=0F2=0.40=0Entropy(s3)=1Entropy(s3A1)=1Gain(s3,A1)=0Entropy(s3A2)=1Gain(s3,A2)=0Entropy(s3A3)=0Gain(s3,A3)=1F3=0.21=0.2 Therefore, the next leaf to split is s3, and it is splitted according to A3. This produces the following partition into 4 leaves: s1={1,2,3,4},s2={5,6,7,8},s4={9},s5={10} Input/Output The dataset is described in a file containing a matrix of integers. Let m be the number of instances (rows) and let n be the number of features/attributes (columns). The last attribute is taken as the target attribute. Example: You may assume that an attribute has no more than 3 distinct values so that the numbers in each column can be 0,1 or 2 . The target attribute (the last column) is binary, so that its values are 0 or 1 . A partition is described in a text file. For a situation where there are m instances, a partition is described by a sequence of numbers in the range 0m1. Each partition is specified in a separate line of integer numbers. The first number in each line is the partition id. The only requirement is that the partition ids are unique. Continuing with the example above that has 10 instances, here is the file representing the partition {0,9},{1,2,3,4},{5,6,7,8}. The first partition has id =2, the second id =5, and the id of the third is 6 . (These numbers are arbitrary but must be unique.) 209 51234 6578 Your program should read as input one dataset file and one partition file. It should determine which partition to split, and according to what attribute. This information should be printed as a message on the console (standard output). The program should then produce as output the new partition file. The attributes are numbered 0n1. Example: Python id3variant.py dataset-1.txt partition-1.txt partition-2.txt Partition 6 was replaced with partitions 7,8 using Attribute 2 If the file partition-1.txt is as specified above, the file partition-2.txt might be

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Nike Inc Case Study Please use the Following Information in the exhibits and information from the Case Study above to answer the following questions in paragraph form in about a 4-5 page analysis -...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

Hi, I'm having an issue with this Simnet project and I was wondering if anyone could help me. There is only one issue: Step 3a, in the instruction pdf that I have attached. When my project is graded...

Please help me with this assignment, 100% human! Reference book George, J. M. (2024). Contemporary management (12th ed.). McGraw-Hill Education. keiser library Syahbinah, S., & Suhardianto, N....

Assignment please Read Article New Coke - Write a response to the following questions: What assumptions did Coca Cola Company make? What biases can you identify? What did competitors do before and...

Protecting Your Trademark ENHANCING YOUR RIGHTS THROUGH FEDERAL REGISTRATION Basic Facts About Trademarks United States Patent and Trademark Office MESSAGE FROM THE USPTO Thank you for requesting a...

MBA 736 Managing Conflict in Organizations I want from Hocker, Berry, Wilmot (2022) Interpersonal Conflict. 11ed. i will add a picture for the book in the end Here is the question information...

JBR-07575; No of Pages 12 Journal of Business Research xxx (2012) xxx-xxx Contents lists available at SciVerse ScienceDirect Journal of Business Research Organizational innovation as an enabler of...

Please answer me page 51 to page 56 on the attachment. is a multiple choice questions. Thank you FAC1502/101/3/2016 Tutorial letter 101/3/2016 Financial accounting concepts, principles and procedures...

Math 221 Discrete Mathematics UOPX Make sure all of the are fresh work with proper citation APA format, no used work will be accepted thanks so much. To be substantive, in addition to meeting the...

DIRECTIONS: To find the answer to the riddle, write the answer to each question in the space immediately below it. In the solution section, the word in parentheses next to the answer to the first...

Answer the following questions 1. Describe and define what you consider the definition of public policy to be in your own words. 2. Provide a basic definition of public policy. 3. Select a public...

All of the following are true when an auditor uses audit data analytics, EXCEPT: Knowledge of the audit client helps refine the auditor's ability to identify potential anomalies There may be...

Combe Corporation has two divisions: Alpha and Beta. Data from the most recent month appear below: Alpha Beta Sales $ 167,000 $ 362,300 Variable expenses $ 106,880 $ 166,300 Traceable fixed expenses...

12-4 How do different decision-making constituencies in an organization use business intelligence, and what is the role of information systems in helping people working in a group make decisions more...

11-18 How do enterprise content management systems help organizations manage structured and semistructured knowledge? What are two examples of each type of knowledge handled by these systems?

12-5 How will MIS help my career?