Question: A4,A5 and A6 Table 1: Data Description Field Description STU ID Student ID index). Honor Class Output Column. Honor class of the student at the

A4,A5 and A6

A4,A5 and A6 Table 1: Data Description Field Description STU ID Student

Table 1: Data Description Field Description STU ID Student ID index). Honor Class Output Column. Honor class of the student at the end of his/her undergrad- uate studies. It has values 1, 2 and 3 which corresponds to first class, second class, and third class respectively. High School Graduated honor class from high school, can be 'First', 'Second', or "Third'. TOEFL score TOEFL Score of student (out of 120) before joining the first year. Math-101 Grade obtained in the first year mathematics course (letter grade) in the first attempt. Phys-101 Grade obtained in the first year physics course (letter grade) in the first at- tempt. Chem-101 Grade obtained in the first year chemistry course (letter grade) in the first attempt. Engl-101 Grade obtained in the first year English course (letter grade) in the first at- tempt. Prog-101 Grade obtained in the first year programming course (letter grade) in the first attempt. Do the following tasks using data given in HW6DataA and Table-1: A-1: Given Data. Read and display the data. Check if there are any missing values. Check if there is any type inconsistency. Handle missing values and resolve inconsistencies (if any). A-2: Data Preparation. Update the dataframe as follows: drop/remove column 'STU ID'; encode all input columns into numeric type. A-3: Decision Tree Analysis. The hypothesis is that the input columns can be helpful in predicting the output column. Do the following: Split the given data into train and test data set, such that train data set contains 70% of the given data. Fit and display the decision tree classifier using train data set. Using the decision tree classifier, calculate and display the confusion matrix for the above test data. . Using the confusion matrix, calculate accuracy. A-4: Classification & Association Rules. Identify and write any three classification rules. Write an association rule between 'High School Math-101', which has the maximum accuracy. Write an association rule between 'TOEFL & High School Engl-101 & Phys-101', which has the maximum support. A-5: Random Forest Analysis. Same hypothesis from A-3. Do the following: Fit a random forest classifier using the train data set from Part A-3. Use 5 estimators. Display the above decision trees with max depth of 3. Using the random forest classifier, calculate and display the confusion matrix for the test data from Part A-3. Using the confusion matrix, calculate accuracy. A-6: Naive Bayes Classification. Same hypothesis from A-3. Do the following: . Fit a naive Bayes classifier using the train data set from Part A-3. Use Categorical NB. Using the classifier, display the confusion matrix for the above test data. . Using the confusion matrix, calculate accuracy. Note: Solve all the above questions using Python. Use Pandas, Seaborn, Sklearn, etc. libraries for all the above analysis Table 1: Data Description Field Description STU ID Student ID index). Honor Class Output Column. Honor class of the student at the end of his/her undergrad- uate studies. It has values 1, 2 and 3 which corresponds to first class, second class, and third class respectively. High School Graduated honor class from high school, can be 'First', 'Second', or "Third'. TOEFL score TOEFL Score of student (out of 120) before joining the first year. Math-101 Grade obtained in the first year mathematics course (letter grade) in the first attempt. Phys-101 Grade obtained in the first year physics course (letter grade) in the first at- tempt. Chem-101 Grade obtained in the first year chemistry course (letter grade) in the first attempt. Engl-101 Grade obtained in the first year English course (letter grade) in the first at- tempt. Prog-101 Grade obtained in the first year programming course (letter grade) in the first attempt. Do the following tasks using data given in HW6DataA and Table-1: A-1: Given Data. Read and display the data. Check if there are any missing values. Check if there is any type inconsistency. Handle missing values and resolve inconsistencies (if any). A-2: Data Preparation. Update the dataframe as follows: drop/remove column 'STU ID'; encode all input columns into numeric type. A-3: Decision Tree Analysis. The hypothesis is that the input columns can be helpful in predicting the output column. Do the following: Split the given data into train and test data set, such that train data set contains 70% of the given data. Fit and display the decision tree classifier using train data set. Using the decision tree classifier, calculate and display the confusion matrix for the above test data. . Using the confusion matrix, calculate accuracy. A-4: Classification & Association Rules. Identify and write any three classification rules. Write an association rule between 'High School Math-101', which has the maximum accuracy. Write an association rule between 'TOEFL & High School Engl-101 & Phys-101', which has the maximum support. A-5: Random Forest Analysis. Same hypothesis from A-3. Do the following: Fit a random forest classifier using the train data set from Part A-3. Use 5 estimators. Display the above decision trees with max depth of 3. Using the random forest classifier, calculate and display the confusion matrix for the test data from Part A-3. Using the confusion matrix, calculate accuracy. A-6: Naive Bayes Classification. Same hypothesis from A-3. Do the following: . Fit a naive Bayes classifier using the train data set from Part A-3. Use Categorical NB. Using the classifier, display the confusion matrix for the above test data. . Using the confusion matrix, calculate accuracy. Note: Solve all the above questions using Python. Use Pandas, Seaborn, Sklearn, etc. libraries for all the above analysis

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

# A Consider data given in CSV file HW6DataA and the following data description: Table 1: Data Description 50 marks Field StdID Description Student ID (index) Statistical background | Whether the...

Written in Java. Please help. Any information would be greatly appreciated. Even shelling of the code would be beneficial if you cannot understand the whole thing. Thank you!!!!!!!!!!!!!!!!!! Example...

please i need help on those question please Never mind, that's all i have. Project Description: The hotel has started faciltating fitness dasses for hotel guests. Guests can register for dasses...

General guidelines of writing for APA style: 1. All verbs referring to your study (and the studies of others) should be in past tense . 2. Do NOT paste quotations longer than 2 words. Paraphrase, but...

I only need answers for 3.1, 3.4, 3.5, 4.3, 5.3, 5.5, 5.6, 6.3, 6.8 (b, c, f, h, k, m), and 7.2. 1 Submission Instructions Note that for each homework assignment, only some of the exercises will be...

Hi I need help with this project that I am doing. It has to be in C language and I don't what to do. This is for my Data Structure course. Please it has to be in Language of C. Programming Assignment...

Mates Rates Rent-A-Car ( just do the part a) using visual studio code (C#) Criteria sheet - Par A Example supplementary files (readme.pdf) Example supplementary files (class-diagram.pdf) Assignment...

COB Advising Department Budget for 2008 Item Amount % Salaries $ 500,000.00 Equipment & Software $ 106,400.00 Reproduction $ 135,000.00 Office Supplies & Phone $ 63,700.00 Telephone $ 6,000.00 Travel...

1.Employees of a certain company took a mean of 7.1vacation days in 2020. The CEO of the company believes that in 2021the average number of sick days was less than 7.1. A sample of 55employees took a...

1. (a) Consider the following chain of six matrices: A1, A2, A3, A4, A5, and A6, where Aj is 5 x 10, A2 is 10 x 3, A3 is 3 x 12, A4 is 12 x 5, A5 is 5 x 50, and A6 is 50 6. Find an optimal...

Triton Industries acquires $400,000 of 7-year MACRS equipment in March 2019. Tritons tax director understands that there are three ways that Triton can recover the cost of the equipment: (1) expense...

Anne Lockwood, manager of Oaks Mall Jewelry, wants to sell on credit, giving customers 3 months to pay. However, Anne will have to borrow from her bank to carry the accounts receivable. The bank will...

In solving from an unknown interest rate involving only the F / P formula, it is possible to solve for i directly by rearranging the equation. Group startsTrue or False True, unselectedFalse,...

Select the example of a sport organization that operates in the commercial sector: Group of answer choices University of Cincinnati Dicks Sporting Goods International Olympic Committee Cincinnati...

7. Have you effectively made use of emotion in your speech? Is it appropriately supported by logic and credibility?

Which form of proof do you find most persuasive? Why?

. Have you selected a topic about which audience members can have a reasonable disagreement? Is it a topic that allows you to influence attitudes, beliefs, or behaviors?