Here is a small example of the bank data that we will use to illustrate the...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Here is a small example of the bank data that we will use to illustrate the subtasks below (we only list a subset of the attributes in this example, see the above table for the description of the attributes): job management technician entrepreneur blue-collar services technician Management technician marital Married Divorced Single Married Divorced Married Divorced Married divorced, 1 married,2 education tertiary secondary secondary unknown secondary tertiary tertiary primary balance 2143 29 2 1506 829 929 22 10 loan Yes Yes No No Yes Yes No No Please note we specify whether you should use [Hive] or [Spark RDD] for each subtask at the beginning of each subtask. a) [Hive] Report the number of clients for each marital status who have a balance above 500 and has a loan. Write the results to Task_1a-out. For the above small example data set you would report the following (output order is not important for this question): b) [Hive] Report the average yearly balance for all people in each job category in descending order of average yearly balance. Write the results to Task_1b-out. For the small example data set you would report the following: blue-collar, 1506.0 management, 1082.5 services,829.0 technician, 322.6666666666667 entrepreneur,2.0 c) [Spark RDD] Group balance into the following three categories: a. Low: -infinity to 500 b. Medium: 501 to 1500 => c. High: 1501 to +infinity Report the number of people in each of the above categories. Write the results to "Task_1c-out" in text file format. For the small example data set you should get the following results (output order is not important in this question): High,2 Medium,2 Low,4 d) [Spark RDD] Output the following details for each person whose job category has an average balance above 500: education, balance, job, marital, loan. Make sure the output is in decreasing order of individual balance. Write the results to Task_1d-out in text file format (output to a single file). For the small example data set you would report the following: tertiary, 2143.0, management, married, yes unknown, 1506.0, blue-collar, married, no secondary, 829.0, services, divorced, yes tertiary, 22.0, management, divorced, no Here is a small example of the bank data that we will use to illustrate the subtasks below (we only list a subset of the attributes in this example, see the above table for the description of the attributes): job management technician entrepreneur blue-collar services technician Management technician marital Married Divorced Single Married Divorced Married Divorced Married divorced, 1 married,2 education tertiary secondary secondary unknown secondary tertiary tertiary primary balance 2143 29 2 1506 829 929 22 10 loan Yes Yes No No Yes Yes No No Please note we specify whether you should use [Hive] or [Spark RDD] for each subtask at the beginning of each subtask. a) [Hive] Report the number of clients for each marital status who have a balance above 500 and has a loan. Write the results to Task_1a-out. For the above small example data set you would report the following (output order is not important for this question): b) [Hive] Report the average yearly balance for all people in each job category in descending order of average yearly balance. Write the results to Task_1b-out. For the small example data set you would report the following: blue-collar, 1506.0 management, 1082.5 services,829.0 technician, 322.6666666666667 entrepreneur,2.0 c) [Spark RDD] Group balance into the following three categories: a. Low: -infinity to 500 b. Medium: 501 to 1500 => c. High: 1501 to +infinity Report the number of people in each of the above categories. Write the results to "Task_1c-out" in text file format. For the small example data set you should get the following results (output order is not important in this question): High,2 Medium,2 Low,4 d) [Spark RDD] Output the following details for each person whose job category has an average balance above 500: education, balance, job, marital, loan. Make sure the output is in decreasing order of individual balance. Write the results to Task_1d-out in text file format (output to a single file). For the small example data set you would report the following: tertiary, 2143.0, management, married, yes unknown, 1506.0, blue-collar, married, no secondary, 829.0, services, divorced, yes tertiary, 22.0, management, divorced, no
Expert Answer:
Answer rating: 100% (QA)
It seems you are looking at a hypothetical data processing task that requires reporting on various aspects of a bank customer dataset The dataset feat... View the full answer
Related Book For
Income Tax Fundamentals 2013
ISBN: 9781285586618
31st Edition
Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill
Posted Date:
Students also viewed these finance questions
-
The following additional information is available for the Dr. Ivan and Irene Incisor family from Chapters 1-5. Ivan's grandfather died and left a portfolio of municipal bonds. In 2012, they pay Ivan...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
The data in CPU reflect the annual values of the consumer price index for all urban consumers (CPI-U) in the United State over the 54-year period 1965 through 2018, using 1982 through 1986 as the...
-
Show that T1/T0 can be expressed in terms of m2/m1 ≡ a and cos ψ ≡ y as Plot T1/T0 as a function of ψ for a = 1, 2, 4, and 12. These plots correspond to the energies of protons or...
-
A study by the New York City government of a program to prevent homelessness found that over 90 percent of families who signed up to receive job training, counseling services, and emergency money did...
-
Maribel Baltazar was hired by clothing retail merchandiser Forever 21 in 2007. During the hiring process, Baltazar was given an 11-page document to sign, two pages of which contained an arbitration...
-
Thome and Crede, CPAs, are preparing their service revenue (sales) budget for the coming year (2017). The practice is divided into three departments: auditing, tax, and consulting. Billable hours for...
-
Scenario 11:Accountant 11 is a senior accountant. She says: I recommended a wonderful coffee serviceto my audit client, I received a commission from the coffee service company, can I take...
-
Tammy Touchtone operates a talent agency called Touchtone Talent Agency. Some clients pay in advance for services; others are billed after services have been performed. Advance payments are credited...
-
find the value of x x 3 lim x-3x 2 6x +9 does not exi
-
The viscosity of SAE 10 W30 oil is = 0.100 N s/m. Determine its kinematic viscosity. The specific gravity is So = 0.92. Express the answer in Sl and FPS units. *1-28. If the kinematic viscosity of...
-
When a nonlinear (or unstable) load is connected to a three phase motor system, the harmonic currents in the neutral wire will become very high? Why or why not ?
-
Barry and Gary have an Uncle Larry. When Barry was born, Uncle Larry gave him 100 shares of stock in GTX Corp, the price of the stock was $10.00/share, and it has returned 8% annually. Uncle Larry...
-
Assume that a 20.0 volt energy source is in a circuit that has a resistance of 4.00 ohms. Determine the power consumption in watts.
-
The following exchange rates are simultaneously quoted in New York and Frankfurt 1.4355 $/ (New York) 0.6825 /$ (Frankfurt) (a) Is there any arbitrage opportunity with these quotations? (b) How would...
-
1.How would you go about obtaining buy-in for a time-out procedure from a child's parents? 2. What would happen if you were using a whole interval DRO procedure for the problem behavior of yelling,...
-
Use translations to graph f. f(x) = x-/2 +1
-
In 2012, Lou has a salary of $54,000 from her job. She also has interest income of $1,700. Lou is single and has no dependents. During the year, Lou sold silver coins held as an investment for a...
-
In the 2012 tax year, Michelle paid the following amounts relating to her 2010 tax return: Tax deficiency..........................................$5,000 Negligence...
-
Carol Harris, Ph.D, CPA, is a single taxpayer and she lives at 674 Yankee Street, Durham, NC 27409. Her Social Security number is 793-52-4335. Carol is an Associate Professor of Accounting at a local...
-
How many structural isomers are there for hydrocarbons that have the molecular formula C 4 H 10 ? (a) none (b) one (c) two (d) three
-
Which contains more hydrogen atomsa five-carbon saturated hydrocarbon molecule or a five-carbon unsaturated hydrocarbon molecule? (a) The unsaturated hydrocarbon has more hydrogen atoms. (b) The...
-
Explain why caprylic acid, CH 3 (CH 2 ) 6 COOH, dissolves in a 5, aqueous solution of sodium hydroxide but caprylaldehyde, CH 3 (CH 2 ) 6 CHO, does not dissolve. (a) With two oxygens, the caprylic...
Study smarter with the SolutionInn App