Question: Problem - 4 : ( 5 Marks ) CountVectorizer is a great tool provided by the scikit - learn library in Python. It is used
Problem: Marks
CountVectorizer is a great tool provided by the scikitlearn library in Python. It is used to transform a given text
into a vector on the basis of the frequency count of each word that occurs in the entire text. This is helpful when
we have multiple such texts, and we wish to convert each word in each text into vectors for further text analysis
Considering the the following sample texts reviews have been collected from online review show your work,
how CountVecorizer works and generates a corresponding vector matrix.
reviews We like our university",
"students are good",
"Good students and faculties",
"Staff was rude",
"Rude staff and not good"
Note: CountVectorizer Plain and Simple: uses utf encoding. Performs tokenization
converts raw text to smaller units of text uses word level tokenization meaning each word is
treated as a separate token ignores single characters during tokenization say goodbye to
words like a and I
Answer:
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
