Question: Consider the mutual information based feature selection. Suppose we have the follow- ing table (the entries in table indicate counts) for the spam versus and
Consider the mutual information based feature selection. Suppose we have the follow- ing table (the entries in table indicate counts) for the spam versus and non-spam emails:
| "prize"=1 | "prize"=0 | |
| "spam"=1 | 150 | 10 |
| "spam"=0 | 1000 | 15000 |
| "hello"=1 | "hello"=0 | |
| "spam"=1 | 155 | 5 |
| "spam"=0 | 14000 | 1000 |
Given the two tables above, calculate the mutual information for the two keywords, "prize" and "hello" respectively. Which keyword is more informative for deciding whether or not the email is a spam?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
