Question: I need help in writting the PIG command for this question: Question : Find closest city for each tweet? For the dataset assume you have
I need help in writting the PIG command for this question:
Question : Find closest city for each tweet? For the dataset assume you have two files: full_text_clean.txt: (userid, lat, lon, tweet, modified_lat, modified_lon) and cities_clean.txt: (city_name, lat, lon, modified_lat, modified_lon) [D2L -> Assignment 3 Pig -> cities_clean.txt].
Hint: For that purpose, both files include a modified lat and lon column (last two columns of both files). So for each of geo-tagged tweets, you will map to multiple nearby cities using the last two columns of both files. After that, for each geo-tagged tweet, you then calculate the distance using the actual lat-lon values and pick the closest city.
Calculating Euclidean Distance (pig example): SQRT((lat_1 lat_2) * (lat_1 lat_2) + (lon_1 lon_2) * (lon_1 lon_2))
Lat_1/Lon_1 refer to lat/lon in full_text_clean.txt. Lat_2/Lon_2 refer to lat/lon in cities_clean.txt
Only submit command.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
