Question: sample_input 2 B A C E D A C C B D sample_output 3: C 2: A 2: A C 2: B 2: B C


sample_input
2 B A C E D A C C B D
sample_output
3: C 2: A 2: A C 2: B 2: B C 2: B C D 2: B D 2: C D 2: D
3: C 2: A C 2: B C D
2: A C 2: B C D
test_input
4 B F G H I B C E F D B C D F G H I C D E H I A E A C H C F A B C E I A C D F H I C A C D H I B E G H C A B C E H I A C D F H A B C D E H I A B D E G H I E G I A C D E G H I B D E F G H I C H I .
.
.
Problem 5. Programming Problem for Frequent Pattern Mining (18 points) Given a transaction dataset, implement the following algorithms. - A frequent pattern mining algorithm (e.g., the Apriori algorithm or FP-Growth) to extract the frequent itemsets. - A closed pattern mining algorithm to extract the closed itemsets. Hint: Use the frequent itemsets extracted in step 1 for identifying closed itemsets. - A maximal pattern mining algorithm to extract the maximal itemsets. Hint: Use the frequent itemsets extracted in step 1 for identifying maximal itemsets. You will not get credit if your code does not work. Please download the "example_input.txt" and the "example_output.txt" for the detail of the input format and the output format. Input Format. The input describes a transaction dataset. The first line of the input corresponds to the minimum support. Each following line of the input corresponds to one transaction. Items in each transaction are seperated by a space. Please refer to the sample input for illustration. In sample input 0 , the minimum support is 2. The dataset contains 3 transactions and 5 item types (A, B, C, D and E). Output Format. The output must satisfy the following formatting requirements to pass the test cases. - The output consists of three successive parts: the frequent itemsets (part 1), the closed itemsets (part 2), and the maximal itemsets (part 3). Each part must be separated by an empty line. - For each part, the corresponding itemsets must appear along with their support, one per line. The format of each such itemsets must be 'support: itemset'; where itemset elements must appear in alphabetical order, separated by space. - The itemsets must be ordered as per their support (from largest to smallest). Ties should be resolved by ordering the itemsets as per their alphabetical order. Please refer to the sample output ("example_output.txt"). In sample output, the first 9 itemsets are the frequent itemsets (part 1), the following 3 itemsets are the closed itemsets (part 2), and the last 2 itemsets are the maximal itemsets (part 3). What you have to submit. Your code file (e.g., homework3.py) with a function freq-pattern-mining() which takes the input file ("test_input.txt"), and a clear README file as the instruction to run your code: (a) (4 points) What is the support value for the pattern "D" and "D F" in the given "test_input.txt"? (b) (4 points) Is the pattern "B D E F" a closed itemset? Is the pattern "A E G H" a closed itemset? If Yes, what is the support value of this pattern?) (c) (4 points) How many maximal itemsets in total? Is "A B C D E F G H I" a maximal itemset in your results? (d) (6 points) We will check the correctness of your submitted programming file for the last 6 points with new test cases. So, you do not need to answer this question in the submitted PDF of Assignment 3
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
