Question: Implement, run and time the following query using Hadoop streaming with python. SELECT lo_quantity, MIN (lo_revenue) FROM (SELECT lo_revenue, MAX(lo_quantity) as lo_quantity, MAX(lo_discount) as lo_discount
Implement, run and time the following query using Hadoop streaming with python. SELECT lo_quantity, MIN (lo_revenue) FROM (SELECT lo_revenue, MAX(lo_quantity) as lo_quantity, MAX(lo_discount) as lo_discount FROM lineorder WHERE lo_orderpriority LIKE '%URGENT' GROUP BY lo_revenue) WHERE lo_discount BETWEEN 6 AND 8 GROUP BY lo_quantity; This requires running two different map reduce jobs. First, you would write a job that executes the subquery and produces an output in HDFS. Then you would write a second job that uses output of the first job as the input.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
