Question: The spam filter in the lecture had 58 input variables to determine the output variable. Most of these record either the proportion of certain words

The spam filter in the lecture had 58 input variables to determine the output variable. Most of these record either the proportion of certain

words in the email ("money" = 0.01 would mean that "money" made up 1% of the words) or the proportion of certain characters in the email

(such as "!" = 0.01 meaning that exclamation points made up 1% of the characters in the email). The variable TOTCAPS, the total number of capital letters in the email, is quite different. When a PCA (principal components analysis) is done with the variables NOT being normalized, the first principal component is dominated by TOTCAPS and captures 92.7% of the variance. When the variables are normalized, the first component captures only 11.6% of the variation and is not dominated by any variable.Explain why this kind of result is to be expected.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!