Question: I need help in writting the PIG command for this question: Question : Find closest city for each tweet? For the dataset assume you have

I need help in writting the PIG command for this question:

Question : Find closest city for each tweet? For the dataset assume you have two files: full_text_clean.txt: (userid, lat, lon, tweet, modified_lat, modified_lon) and cities_clean.txt: (city_name, lat, lon, modified_lat, modified_lon) [D2L -> Assignment 3 Pig -> cities_clean.txt].

Hint: For that purpose, both files include a modified lat and lon column (last two columns of both files). So for each of geo-tagged tweets, you will map to multiple nearby cities using the last two columns of both files. After that, for each geo-tagged tweet, you then calculate the distance using the actual lat-lon values and pick the closest city.

Calculating Euclidean Distance (pig example): SQRT((lat_1 lat_2) * (lat_1 lat_2) + (lon_1 lon_2) * (lon_1 lon_2))

Lat_1/Lon_1 refer to lat/lon in full_text_clean.txt. Lat_2/Lon_2 refer to lat/lon in cities_clean.txt

Only submit command.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!