Question: Can someone help with F,G, and h. In this question, we are giving you the task for trying to prevent cybercrime by performing some data
Can someone help with F,G, and h.


In this question, we are giving you the task for trying to prevent cybercrime by performing some data analytics. You are given the following sample categorized emails where the emails are categorized based on if they have the word "Free' or "Sale" in their headers and if they are actually spam or not ("Y" means "yes" the email has the word or is spam, 'N' means the email does not have the word or is not spam) : Email Subject Word Spam Frequency "Free" "Sale" Y Y Y Y Y 15 Y Y N N 19 IN Y N N N N Total 125 We wish to create a spam filter. So as a first pass at this task we wish to determine some association rules based on the sample set of email data given in the above table. Based on the above table (provide all answers to 3 significant digits): a. (2 marks) What is the probability that an email has both the word "Free' in its subject line and is spam? b. (2 marks) What is the probability that an email with the word 'Free" in its subject line is spam?c. (2 marks) What is the probability that an email without the word 'Free' in its subject line is spam? d. (2 marks) What is the probability that an email with the word "Sale" in its subject line is spam? e. (2 marks) What is the probability that an email without the word "Sale" in its subject line is spam? f. (1 mark) What single property (e.g. the word 'Free" is in the subject line) is the most likely to indicate that an email is spam (hint: use association rules)? g- (9 marks) What complete set of properties (i.e. assigning values over all properties of "Free' and "Sale") is most likely to indicate that an email is spam? h. (1 mark) What property (ie. over all possible single or combination) is the most likely to indicate that an email is spam
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
