There are fake comments created by the computers in the Amazonreview system. Prof. Michael Luca from Harvard
Question:
There are fake comments created by the computers in the Amazonreview system. Prof.
Michael Luca from Harvard Business School argues 1 that there’sbeen some evidence
that fake reviews are sloppier in general: ”Short, vague reviewsare a pretty good marker,
[along with] poor punctuation and grammar.”
Here are some examples of probably fake comments (e.g., ”GREAT”)and their corre-
sponding ratings (e.g., 5 Star) in our data set:
6^220^Five Stars^2016-01-09^false^ Quality product.^5.00
6^221^Five Stars^2016-01-09^false^ Great quality.^5.00
6^222^Five Stars^2015-11-25^false^ Excellent^5.00
6^223^Five Stars^2016-01-14^false^ GREAT^5.00
It looks like that these fake reviews tend to be more common in the5 star ratings than
1 star ratings. Let’s examine the average length (number of thewords) of the comments
for each rating and see if it really holds.
Please design and implement a PySpark programme to examine theaverage length of
comments (column: ReviewContent) in each rating (column:ReviewRating). We have 5 levels of rating here where 1 starrating represents the worst experience and the 5
star rating represents the best experience. Hint: you can removepunctuation in each
comment with the following code:
import re
re.sub(’W+’, ’ ’, mystring).
’W+’ is a regular expression that matches any non-alphanumericcharacters.
What expected:
You should turn in an one python file which prints out the averagelength of the comments
for each star rating:
$ spark-submit 1-length.py
1 star rating: average length of comments __
2 star rating: average length of comments __
3 star rating: average length of comments __
4 star rating: average length of comments __
5 star rating: average length of comments --
Smith and Roberson Business Law
ISBN: 978-0538473637
15th Edition
Authors: Richard A. Mann, Barry S. Roberts