Question: write a function which takes in an email (a string type) and returns either that it is SPAM or HAM. The function in Python

write a function which takes in an email (a string type) and returns either that it is "SPAM" or "HAM". The function in Python may look something like this. 1 def classify (email: str): #Some Code Here 3 4 if some condition: return SPAM else: return HAM So how do we write the code to make the decision for us? In the past, people tried writing these classifiers with a set of rules that they came up themselves. For example, if it is over 1000 words, predict "SPAM". Or if it contains the word 'millionaire', predict that it is "SPAM". This leads to code which looks like a ton of if-else statements, and is also not very accurate. In machine learning, we come up with a model that learns a decision-making rule for us! 2.1 Prepossessing Handling text can be very messy. People misspell words, use slang that isn't in the vocabulary, have bad grammar, use tons of punctuation, and so on. When we process our emails, we will employ the following approach: 1. Ignore Duplicate Words. 2. Ignore Punctuation. 3. Ignore Casing. That is, we will reduce an email into a Set of lowercase words and nothing else! We'll see a potential drawback to this later, but despite these strong assumptions, the classifier still does a really good job! 2.2 Decision Rule For this section, we'll use the example of classifying the email "Wanna be millionaire!". The representation we have after processing is wanna, be, millionaire. Here's the approach of the Naive Bayes classifier. We will compute and compare the following two quantities (which must add to 1): P(spam [wanna, be, millionaire)) and P(ham {wanna, be, millionaire]) This is because, for a particular email, it is either spam or ham, and so the probabilities must sum to 1. In fact, because they both sum to 1, we can just compute one of them (let's say the rfist), and predict SPAM if F(spam (wanna, be, millionaire)) > 0.5 and HAM otherwise. Note that if it is exactly equal to 0.5, we will predict HAM (this is arbitrary - you can break ties however you want).
Step by Step Solution
3.38 Rating (157 Votes )
There are 3 Steps involved in it
Ans following is the example python code to detec... View full answer
Get step-by-step solutions from verified subject matter experts
