Question: Here is a small example of the bank data that we will use to illustrate the subtasks below (we only list a subset of

Here is a small example of the bank data that we will use to illustrate the subtasks below (we only list a b) [Hive] Report the average yearly balance for all people in each job category in descending order of c) [Spark RDD] Group balance into the following three categories: a. Low: -infinity to 500 b. Medium: 501 to d) [Spark RDD] Output the following details for each person whose job category has an average balance above

Here is a small example of the bank data that we will use to illustrate the subtasks below (we only list a subset of the attributes in this example, see the above table for the description of the attributes): job management technician entrepreneur blue-collar services technician Management technician marital Married Divorced Single Married Divorced Married Divorced Married divorced, 1 married,2 education tertiary secondary secondary unknown secondary tertiary tertiary primary balance 2143 29 2 1506 829 929 22 10 loan Yes Yes No No Yes Yes No No Please note we specify whether you should use [Hive] or [Spark RDD] for each subtask at the beginning of each subtask. a) [Hive] Report the number of clients for each marital status who have a balance above 500 and has a loan. Write the results to Task_1a-out. For the above small example data set you would report the following (output order is not important for this question): b) [Hive] Report the average yearly balance for all people in each job category in descending order of average yearly balance. Write the results to Task_1b-out. For the small example data set you would report the following: blue-collar, 1506.0 management, 1082.5 services,829.0 technician, 322.6666666666667 entrepreneur,2.0 c) [Spark RDD] Group balance into the following three categories: a. Low: -infinity to 500 b. Medium: 501 to 1500 => c. High: 1501 to +infinity Report the number of people in each of the above categories. Write the results to "Task_1c-out" in text file format. For the small example data set you should get the following results (output order is not important in this question): High,2 Medium,2 Low,4 d) [Spark RDD] Output the following details for each person whose job category has an average balance above 500: education, balance, job, marital, loan. Make sure the output is in decreasing order of individual balance. Write the results to Task_1d-out in text file format (output to a single file). For the small example data set you would report the following: tertiary, 2143.0, management, married, yes unknown, 1506.0, blue-collar, married, no secondary, 829.0, services, divorced, yes tertiary, 22.0, management, divorced, no

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

Solutions for Bank Data Analysis a Hive Query for Clients with Loan and Balance 500 by Marital Statu... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

The management of the Dynaco manufacturing plant wants to connect the eight major manufacturing areas of its plant with a forklift route. Because the construction of such a route will take a...

Hillsdale Company, Inc. Inventory Test Counts: Roofing Materials Warehouse January 11, 2008 C60 DS 58 Ticket Count per Number Description Client Audit Difference: Over (Under) 1245 Stock No. C10568:...

The following additional information is available for the Dr. Ivan and Irene Incisor family from Chapters 1-5. Ivan's grandfather died and left a portfolio of municipal bonds. In 2012, they pay Ivan...

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Chart Assignment: After reviewing Ch. 10 of aplia in the text, you will put your knowledge to work. You will be constructing charts based on the information below. You must use the information found...

WestWorld is a small retail company specializing in selling tech products. The company initially began selling items though a Facebook company page and tracked all its customer, product and order...

Use CaseModeling: Context Diagram, Can you please help me make a usecase context diagram for the NexGen POS system based on the format of Larman figure 6.3 and referencing figures 6.4 and 6.5. Can...

Use CaseModeling Part 1: Context Diagram Develop a usecase context diagram for the NexGen POS system based on the format of Larmanfigure 6.3 and referencing figures 6.4 and 6.5. Use Larman chapter 6,...

There are two problems due this week (each worth 35 points) as follows. Problem 1.6 (page 20) In comprehensive paragraphs, answerrequirements a to e. You will have 5 paragraphs total of four to five...

the following relational algebra questions based on tables S1 and S2 (below) Question 1 What is the output of S1 S2 ? The row identified by sid 22 The rows identified by sid 31 and 58 The row...

Follow the steps given in Machine Learning With R , Chapter 5, section "Example Identifying Risky Bank Loans Using C5.0 Decision Trees." download the credit. csv file from Packt Publishing's website...

The contingency table shows the results of a random sample of endangered and threatened species by status and vertebrate group. At = 0.01, test the hypothesis that the variables are independent. (a)...

Pleasant Place Plc is planning to obtain a stock market listing by offering 30% of its existing shares to the public. No new shares will be issued. Its most recent summarized results are as follows;...

Suppose 2 4 f ( x ) d x = - 6 , 2 7 f ( x ) d x = - 3 , and 2 7 g ( x ) d x = - 5 . Evaluate the following integrals. 7 2 g ( x ) d x = 5 3 ( Simplify your answer. ) 1 4 2 7 8 g ( x ) d x = (...

Karens Cupcakes in Problem 8-9B keeps employee earnings records. William Barone, employee number 19, is employed as a baker in the desserts department. He was born on August 26, 1959, and was hired...

Based on the following petty cash information, prepare (a) The journal entry to establish a petty cash fund, (b) The journal entry to replenish the petty cash fund. On October 1, 20--, a check was...

Good Time Company is a regional chain department store. It will remain in business for one more year. The probability of a boom year is 60 percent and the probability of a recession is 40 percent. It...

Consider the following situations: a. Business receives $3,800 on January 1 for 10-month service contract for the period January 1 through October 31. b. Total salaries for all employees is $3,200...

What are slack, surplus, and artificial variables? When is each used, and why? What value does each carry in the objective function?

Using the Federal Practice area, select the Citator 2nd and search for the keywords 2009-47.Why are so many documents retrieved for this citation?Use Find by Citation and locate Rev. Proc. 2009-47....

In 2012, Lou has a salary of $54,000 from her job. She also has interest income of $1,700. Lou is single and has no dependents. During the year, Lou sold silver coins held as an investment for a...

In the 2012 tax year, Michelle paid the following amounts relating to her 2010 tax return: Tax deficiency..........................................$5,000 Negligence...

Carol Harris, Ph.D, CPA, is a single taxpayer and she lives at 674 Yankee Street, Durham, NC 27409. Her Social Security number is 793-52-4335. Carol is an Associate Professor of Accounting at a local...

How do you make psychology terms more personally meaningful so you remember them better? Could you do this more often?

Taste-aversion research has shown that some animals develop aversions to certain tastes but not to sights or sounds. This finding supports Pavlovs demonstration of generalization. Darwins principle...

One way to change behavior is to reward natural behaviors in small steps, as they get closer and closer to a desired behavior. This process is called ______.