Question: Apache Pig A) Query customer name that have the least number of transactions and output customer names, and the number of transactions. B) Join Customers
Apache Pig
A) Query customer name that have the least number of transactions and output customer names, and the number of transactions.
B) Join Customers and Transactions using Broadcast (replicated) join. Report: CustomerID, Name, Salary, NumOf Transactions, TotalSum, MinItems (Where NumOfTransactions is the total number of transactions done by the customer, TotalSum is the sum of field TransTotal for that customer, and MinItems is the minimum number of items in transactions done by the customer.)
C) Report the Country Codes having number of customers greater than 5,000 or less than 2,000.
D) Assume we want to design an analytics task on the data as follows: the Age attribute is divided into six groups, which are [10, 20), [20, 30), [30, 40), [40, 50), [50, 60), and [60, 70]. Within each of the above age ranges, further division is performed based on the Gender, i.e., each of the 6 age groups is further divided into two groups. Each group reports: Age Range, Gender, MinTransTotal, MaxTransTotal, AvgTransTotal. Note: The bracket [ means the lower bound of a range is included, where as ) means the upper bound of a range is excluded.
Following datasets:
customers = LOAD 'customers.txt' USING PigStorage(',') as (id:int, name:chararray, age:int, gender:chararray, CountryCode:int, salary:float);
transactions = LOAD 'transactions.txt' USING PigStorage(',') as (trans_id:int, id:int, age:int, total:float, num_items:int, description:chararray);
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
