Question: Write a MapReduce job that processes the global weather dataset and returns the records of the country India. The output should contain 4 different files.

Write a MapReduce job that processes the global weather dataset and returns the records of the country "India". The output should contain 4 different files. Each file would contain weather data of one entire century.

For example: 4 part files should contain the data in the following pattern:

  • File1 : Year 1700-1799 ( All the records of the 18th century will be stored in File1)
  • File2 : Year 1800-1899( All the records of the 19th century will be stored in File2)
  • File3 : Year 1900-1999( All the records of the 20th century will be stored in File3)
  • File4 : Year 2000-Present ( All the records of the 21st century will be stored in File4)

Input Dataset: Refer to the path given below: (hdfs:///bigdatapgp/common_folder/assignment3/weather/weather1.csv)

Dataset Description:

COLUMN NAME DESCRIPTION
dt Date
AverageTemperature Average Temperature of that city
AverageTemperatureUncertainity Uncertainty in the Average Temperature
City Name of the city
Country Name of the country that the city belongs to
Latitude Latitude of the city
Longitude Longitude of the city

Constraints:

  • Skip header row while reading the file
  • Use the concept of Partitioner

Expected Solution: You need to paste the MR code, Hadoop commands & path of the final jar that is used to achieve this output.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!