Question: 1. Conceptual question (30 points). (a) (15 points) Consider the mutual information based feature selection. Suppose we have the follow- ing table (the entries in


1. Conceptual question (30 points). (a) (15 points) Consider the mutual information based feature selection. Suppose we have the follow- ing table (the entries in table indicate counts) for the spam versus and non-spam emails: \"prize\" 2 1 \"prize\" 2 0 \"spam\" = 1 10 \"spam\" = 0 15000 \"hello\" 2 1 \"hello\" 2 0 \"spam\" 2 155 5 \"spam\" 2 0 14000 1000 Given the two tables above, calculate the mutual information for the two keywords, \"prize\" and \"hello\" respectively. Which keyword is more informative for deciding whether or not the email is a spam? (b) (15 points) Given two distributions, f0 = N(0,1), f1 = N(3, 1) (meaning that we are interested in detecting a mean shift of minimum size 3), derive what should be the CUSUM statistic (i.e., write down the CUSM detection statistic). Plot the CUSUM statistic for a sequence of randomly generated samples, 931, . . . ,mlog are are i.i.d. (independent and identically distributed) according to f0 and $101, . . . ,33200 that are i.i.d. accordign to f1
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
