write a function which takes in an email (a string type) and returns either that it...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
write a function which takes in an email (a string type) and returns either that it is "SPAM" or "HAM". The function in Python may look something like this. 1 def classify (email: str): #Some Code Here 3 4 if some condition: return SPAM else: return HAM So how do we write the code to make the decision for us? In the past, people tried writing these classifiers with a set of rules that they came up themselves. For example, if it is over 1000 words, predict "SPAM". Or if it contains the word 'millionaire', predict that it is "SPAM". This leads to code which looks like a ton of if-else statements, and is also not very accurate. In machine learning, we come up with a model that learns a decision-making rule for us! 2.1 Prepossessing Handling text can be very messy. People misspell words, use slang that isn't in the vocabulary, have bad grammar, use tons of punctuation, and so on. When we process our emails, we will employ the following approach: 1. Ignore Duplicate Words. 2. Ignore Punctuation. 3. Ignore Casing. That is, we will reduce an email into a Set of lowercase words and nothing else! We'll see a potential drawback to this later, but despite these strong assumptions, the classifier still does a really good job! 2.2 Decision Rule For this section, we'll use the example of classifying the email "Wanna be millionaire!". The representation we have after processing is wanna, be, millionaire. Here's the approach of the Naive Bayes classifier. We will compute and compare the following two quantities (which must add to 1): P(spam [wanna, be, millionaire)) and P(ham {wanna, be, millionaire]) This is because, for a particular email, it is either spam or ham, and so the probabilities must sum to 1. In fact, because they both sum to 1, we can just compute one of them (let's say the rfist), and predict SPAM if F(spam (wanna, be, millionaire)) > 0.5 and HAM otherwise. Note that if it is exactly equal to 0.5, we will predict HAM (this is arbitrary - you can break ties however you want). write a function which takes in an email (a string type) and returns either that it is "SPAM" or "HAM". The function in Python may look something like this. 1 def classify (email: str): #Some Code Here 3 4 if some condition: return SPAM else: return HAM So how do we write the code to make the decision for us? In the past, people tried writing these classifiers with a set of rules that they came up themselves. For example, if it is over 1000 words, predict "SPAM". Or if it contains the word 'millionaire', predict that it is "SPAM". This leads to code which looks like a ton of if-else statements, and is also not very accurate. In machine learning, we come up with a model that learns a decision-making rule for us! 2.1 Prepossessing Handling text can be very messy. People misspell words, use slang that isn't in the vocabulary, have bad grammar, use tons of punctuation, and so on. When we process our emails, we will employ the following approach: 1. Ignore Duplicate Words. 2. Ignore Punctuation. 3. Ignore Casing. That is, we will reduce an email into a Set of lowercase words and nothing else! We'll see a potential drawback to this later, but despite these strong assumptions, the classifier still does a really good job! 2.2 Decision Rule For this section, we'll use the example of classifying the email "Wanna be millionaire!". The representation we have after processing is wanna, be, millionaire. Here's the approach of the Naive Bayes classifier. We will compute and compare the following two quantities (which must add to 1): P(spam [wanna, be, millionaire)) and P(ham {wanna, be, millionaire]) This is because, for a particular email, it is either spam or ham, and so the probabilities must sum to 1. In fact, because they both sum to 1, we can just compute one of them (let's say the rfist), and predict SPAM if F(spam (wanna, be, millionaire)) > 0.5 and HAM otherwise. Note that if it is exactly equal to 0.5, we will predict HAM (this is arbitrary - you can break ties however you want).
Expert Answer:
Related Book For
Artificial Intelligence A Modern Approach
ISBN: 9780134610993
4th Edition
Authors: Stuart Russell, Peter Norvig
Posted Date:
Students also viewed these programming questions
-
Ali Sung died of illness on July 1, 2014. Before passing away, he had appointed his son, Deeja Sung, as the executor of his estate. Ali left a will as follows: Any excess of income over expense...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
I am a capitalist, and after a 30-year career in capitalism spanning three dozen companies, generating tens of billions of dollars in market value, I'm not just in the top one percent, I'm in the top...
-
Jorge has investments in two limited partnerships, both of which constitute passive activities. This year, Jorges distributive share of income or loss from each partnership is as follows: In addition...
-
The mechanism shown is acted upon by the force P. Derive an expression for the magnitude of the force Q required for equilibrium. OE
-
You are given the following information about Company ABC as at 31/12/2023 Premises Shs 25,000,000 Motor vehicles Shs 2,000,000 Electricity bills (Owings) shs500, 000 Creditors' Shs 4,500,000 Cash in...
-
Distinguish the following data as qualitative and quantitative. Explain why. a. Social Security Number. b. Zip Code. c. Degree of job satisfaction. d. Price. e. Age. f. Effectiveness. g....
-
HD Hogg Motorcycle Company manufactures a variety of motorcycles. Hoggs purchasing policy requires that the purchasing agents place each quarters purchasing requirements out for bid. This is because...
-
5. A diesel-driven centrifugal fire pump is rated at 1200 gpm at 100 psi with a churn pressure of 145 psig. Static city pressure varies between 60 and 70 psig. Determine if a relief valve is...
-
Practice reading a graph. These questions all refer to Figure 1.1. (a) What fraction of days have a daily high temperature of 28C during the 1970s and the 2010s? (b) For the 1970s and 2010s, what is...
-
Mudvayne, Inc. is trying to determine its cost of debt. The firm has a debt issue outstanding with 18 years to maturity that is quoted at 107 percent of face value. The issue makes semiannual...
-
A helicopter 9.20 m above the ground and descending at 4.00 m/s drops a package from rest (relative to the helicopter). We have chosen the positive positive direction to be upward. The package falls...
-
A speeding car is pulling away from the police car. The police car is moving at 30 m/s. The speeding car is moving at 65 m/s. The radar gun in the police car emits an electromagnetic wave with a...
-
A small plane has a propeller that has a radius of 1.25 m. The pilot turns on the engine and the blade of the propeller spins up to 171 rad/s as it accelerates at 70.0 rad/s 2 . A. How long does it...
-
calculate kinetic energy at max height if compression distance is 0.08m, spring potential energy is 0.37440, max airborne Potential energy is 0.24768 and mass is 0.200kg and the max height is 0.126m
-
Stefani and raven are floating in space, and Stefani throws a dodgeball at raven. What will Stefani's motion look like after she has thrown the ball?
-
Let f be a function for which f () = x + x +1. Find the sum of all values of z for which f(32) = 7.
-
Doorharmony Company makes doorbells. It has a weighted- average cost of capital of 5% and total assets of $ 5,900,000. Doorharmony has current liabilities of $ 750,000. Its operating income for the...
-
I. J. Good claims that intelligence is the most important quality, and that building ultra intelligent machines will change everything. A sentient cheetah counters that Actually speed is more...
-
Consider a simple Bayesian network with root variables Cold, Flu, and Malaria and child variable Fever , with a noisy-OR conditional distribution for Fever as described in Section 13.2.2. By adding...
-
This exercise concerns the classification of spam email. Create a corpus of spam email and one of non-spam mail. Examine each corpus and decide what features appear to be useful for classification:...
-
Which of Yellows statements regarding the factors affecting the selection of a trading strategy is correct? A. Statement 1 B. Statement 2 C. Statement 3 Robert Harding is a portfolio manager at...
-
To fill the remaining portion of the ABC order, Yellow is using: A. an arrival price trading strategy. B. a TWAP participation strategy. C. a VWAP participation strategy. Robert Harding is a...
-
Given the parameters for the benchmark given by Harding, Yellow should recommend a benchmark that is based on the: A. arrival price. B. time-weighted average price. C. volume-weighted average price....
Study smarter with the SolutionInn App