Write a MapReduce job that processes the global weather dataset and returns the records of the country India The output should contain 4 different files Each file would contain weather data of one entire century For example 4 part files should contain the data in the following pattern File1 Year 1700 1799 ( All the records of the 18th century will be stored in File1) File2 Year 1800 1899( All the records of the 19th century will be stored in File2) File3 Year 1900 1999( All the records of the 20th century will be stored in File3) File4 Year 2000 Present ( All the records of the 21st century will be stored in File4) Input Dataset Refer to the path given below (hdfs bigdatapgp common folder assignment3 weather weather1 csv) Dataset Description COLUMN NAME DESCRIPTION dt Date AverageTemperature Average Temperature of that city AverageTemperatureUncertainity Uncertainty in the Average Temperature City Name of the city Country Name of the country that the city belongs to Latitude Latitude of the city Longitude Longitude of the city Constraints Skip header row while reading the file Use the concept of Partitioner Expected Solution You need to paste the MR code, Hadoop commands path of the final jar that is used to achieve this output

The Answer is in the image, click to view ...

Question: Write a MapReduce job that processes the global weather dataset and returns the records of the country India. The output should contain 4 different files.

Write a MapReduce job that processes the global weather dataset and returns the records of the country "India". The output should contain 4 different files. Each file would contain weather data of one entire century.

For example: 4 part files should contain the data in the following pattern:

File1 : Year 1700-1799 ( All the records of the 18th century will be stored in File1)
File2 : Year 1800-1899( All the records of the 19th century will be stored in File2)
File3 : Year 1900-1999( All the records of the 20th century will be stored in File3)
File4 : Year 2000-Present ( All the records of the 21st century will be stored in File4)

Input Dataset: Refer to the path given below: (hdfs:///bigdatapgp/common_folder/assignment3/weather/weather1.csv)

Dataset Description:

COLUMN NAME	DESCRIPTION
dt	Date
AverageTemperature	Average Temperature of that city
AverageTemperatureUncertainity	Uncertainty in the Average Temperature
City	Name of the city
Country	Name of the country that the city belongs to
Latitude	Latitude of the city
Longitude	Longitude of the city

Constraints:

Skip header row while reading the file
Use the concept of Partitioner

Expected Solution: You need to paste the MR code, Hadoop commands & path of the final jar that is used to achieve this output.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

(Short-Answer and Algebraic Questions): (The numbers in square brackets give the breakdown of the points for various parts of each question. please explain your answers.) 1. This questions is based...

XBRL Audit Instructions With this project you will explore Starbucks' use of XBRL in filing 10-Ks with the SEC. This audit will require you to look at the underlying data in the electronic filing of...

A discrete sequence {xn} can be converted into a continuous representation x(t) = ts X n= (t n ts) xn, where ts is the sampling period. (a) State two characteristic properties of Dirac's function. [2...

***************The data file is flags.csv****************** For project 2 we are changing our focus to loops and functions. We will of course need to use selection statements and as always order or...

There are two problems due this week (each worth 35 points) as follows. Problem 1.6 (page 20) In comprehensive paragraphs, answerrequirements a to e. You will have 5 paragraphs total of four to five...

The Final Project is to develop a simple database system. The database is to handle multiple records, each composed of several fields. The database will store its information to a file, addition and...

2. Standardized Financial Statements Using the Amazon Financials provided below, prepare the common-size income statements and balance sheets for the two most recent years. and 3. Financial Ratios....

Disaster Recovery Needs Contingency Planning (Print Ready) Page 1 of 2 Disaster Recovery Needs Contingency Planning By: Andy Butler | Posted: Jan 13th, 2008 Disaster recovery will be an...

Describe the major factors that have influenced the evolution of the management thought? Identify the five major perspectives of management thought?

2. Assume that the recoverable amount recovered to $3,540 in the subsequent year. Allocate the impairment reversal to individual assets and calculate the net book value of each asset after the...

Compute 3 M ' s average collection period for accounts receivable in days. ( Use 3 6 5 days for calculation. Round answer to 1 decimal place, e . g . 2 . 5 . ) Average Collection Period days

Reach is an especially critical dimension for which of the following firms? a . Blockbuster b . JCPenney c . MySpace d . Colgate - Palmolive

3. How can diversity be increased in an organisation without alienating people?

explain the concept of strategy formulation

describe what strategy is, including the different levels of strategy, and how corporate strategy affects HR strategy