Determine what hours of the day most checkins occur. Create a variable hours_by_checkin_count. This should...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Determine what hours of the day most checkins occur. • Create a variable hours_by_checkin_count. This should be a PySpark DataFrame • The DataFrame should be ordered by count and contain 24 rows • The DataFrame should have these columns (in this order): ▪ hour (the hour of the day as an integer, the hour after midnight being 0) ▪count (the number of checkins that occurred in that hour) Note that the date column in the checkin data is a string with multiple date times in it. You'll need to split that string before parsing. In [33]: # YOUR CODE HERE from pyspark.sql.functions import hour, split, col #Split the date column in the checkin data and extract the hour from it. checkin "1 ").getItem(1)) checkin.withColumn ("hour", split(col ("date"), checkin= checkin.withColumn("hour", split(col ("hour"), ":").getItem(0).cast("int")) #Group by hour and count the number of checkins in each hour. hours_by_checkin_count = checkin.groupBy("hour").count().orderBy("count", ascending=False) hours_by_checkin_count.show() #raise NotImplementedError() [Stage 39:> +--- | hour count | +----+-----+ 1 19|13481| 23 13207| 22|13191| 18 13177 21 12960 20 12553 17|12304| 0|11577 | 16 10416 1 9803 | 2 7258 15 7000| 3 5225 14 4340 (0 + 1) / 1] In [31]: assert type (hours_by_checkin_count) == pyspark.sql.dataframe.DataFrame, \ "The hours_by_checkin_count variable should be a Spark DataFrame.' assert hours_by_checkin_count.columns == ["hour", "count"], \ "The columns are not in the correct order." submitted = AutograderHelper.parse_spark_dataframe (hours_by_checkin_count) In [32]: # Autograder cell. This cell is worth 1 point (out of 20). This cell does not contain hidden tests. assert len(submitted) == 24, \ "The hours_by_checkin_count DataFrame must have 24 rows." assert submitted [ "hour" ][0] == 1, \ 'The first row should have hour 1' AssertionError Cell In [32], line 6 1 # Autograder cell. This cell is 3 assert len (submitted) == 24, \ 4 11 Traceback (most recent call last) worth 1 point (out of 20). This cell does not contain hidden tests. "The hours_by_checkin_count DataFrame must have 24 rows." ‒‒‒‒> 6 assert submitted [ "hour"][0] == 1, \ 7 'The first row should have hour 1' AssertionError: The first row should have hour 1 In [18] #Autograder cell. This cell is worth 4 points (out of 20). This cell contains hidden tests. Determine what hours of the day most checkins occur. • Create a variable hours_by_checkin_count. This should be a PySpark DataFrame • The DataFrame should be ordered by count and contain 24 rows • The DataFrame should have these columns (in this order): ▪ hour (the hour of the day as an integer, the hour after midnight being 0) ▪count (the number of checkins that occurred in that hour) Note that the date column in the checkin data is a string with multiple date times in it. You'll need to split that string before parsing. In [33]: # YOUR CODE HERE from pyspark.sql.functions import hour, split, col #Split the date column in the checkin data and extract the hour from it. checkin "1 ").getItem(1)) checkin.withColumn ("hour", split(col ("date"), checkin= checkin.withColumn("hour", split(col ("hour"), ":").getItem(0).cast("int")) #Group by hour and count the number of checkins in each hour. hours_by_checkin_count = checkin.groupBy("hour").count().orderBy("count", ascending=False) hours_by_checkin_count.show() #raise NotImplementedError() [Stage 39:> +--- | hour count | +----+-----+ 1 19|13481| 23 13207| 22|13191| 18 13177 21 12960 20 12553 17|12304| 0|11577 | 16 10416 1 9803 | 2 7258 15 7000| 3 5225 14 4340 (0 + 1) / 1] In [31]: assert type (hours_by_checkin_count) == pyspark.sql.dataframe.DataFrame, \ "The hours_by_checkin_count variable should be a Spark DataFrame.' assert hours_by_checkin_count.columns == ["hour", "count"], \ "The columns are not in the correct order." submitted = AutograderHelper.parse_spark_dataframe (hours_by_checkin_count) In [32]: # Autograder cell. This cell is worth 1 point (out of 20). This cell does not contain hidden tests. assert len(submitted) == 24, \ "The hours_by_checkin_count DataFrame must have 24 rows." assert submitted [ "hour" ][0] == 1, \ 'The first row should have hour 1' AssertionError Cell In [32], line 6 1 # Autograder cell. This cell is 3 assert len (submitted) == 24, \ 4 11 Traceback (most recent call last) worth 1 point (out of 20). This cell does not contain hidden tests. "The hours_by_checkin_count DataFrame must have 24 rows." ‒‒‒‒> 6 assert submitted [ "hour"][0] == 1, \ 7 'The first row should have hour 1' AssertionError: The first row should have hour 1 In [18] #Autograder cell. This cell is worth 4 points (out of 20). This cell contains hidden tests.
Expert Answer:
Answer rating: 100% (QA)
appears that youre working with PySpark and trying to analyze checkin data to determine the ... View the full answer
Related Book For
Cost Management Measuring Monitoring And Motivating Performance
ISBN: 9781118168875
2nd Canadian Edition
Authors: Leslie G. Eldenburg, Susan Wolcott, Liang Hsuan Chen, Gail Cook
Posted Date:
Students also viewed these programming questions
-
New products are being introduced to the marketplace at a rapid pace, and consumer trends seem to be changing faster. What are the most important factors that have shifted the demand for and supply...
-
Managing Scope Changes Case Study Scope changes on a project can occur regardless of how well the project is planned or executed. Scope changes can be the result of something that was omitted during...
-
Presented here are summarized data from the balance sheets and income statements of Wiper Inc.: WIPER INC. Condensed Balance Sheets December 31, 2020, 2019, 2018 (in millions) 2020 2019 Current...
-
Classification IssuesIntangibles) Presented below and on the next page is a list of items that could be included in the intangible assets section of the balance sheet. 1. Investment in a subsidiary...
-
List three functions of membrane proteins.
-
Is there a field line pattern that could everywhere represent either the magnetic field due to a magnet or the electric field due to a system of fixed charged particles?
-
Johnstone Corp. manufactures mid-fi and hi-fi stereo receivers. The following data have been summarized: Indirect manufacturing cost information includes the following: The company plans to...
-
1. Your program asks the user how many bananas they want to buy, and what the price is. This information is then passed to getTotal to calculate the total cost. System.out.println("How many bananas...
-
You are reviewing audit work papers containing a narrative description of the Tenney Corporations factory payroll sys-tem. A portion of the narrative is as follows: Factory employees punch time clock...
-
Suppose you are the senior executive in credit department for BTY Bank. Alexa Reeb, the department head, thinks that loan sales should be considered as "always a good choice" for any bank or...
-
Marielle Machinery Works is considering a project which has an initial investment of 10,000 and has expected cash flows of 0 in year 1, 7,500 in year 2, and 8,500 in year 3. The company uses the IRR...
-
Indicate whether the following items are Section 1245 or Section 1250 property or, if neither, indicate what type of property they are considered to be (e.g., capital asset, Section 1231 asset)....
-
Etonic SA is considering an investment of 250,000 in an asset with an economic life of 5 years. The firm estimates that the nominal annual cash revenues and expenses at the end of the first year will...
-
Assume that, in 2016, ABB purchased a new automation technology for SFr500 million. They paid this on credit and wont be due to actually pay for the automation technology until 2018. The managers of...
-
Suppose you own shares in a company. The current share price is 2.50. Another company has just announced that it wants to buy your company and will pay 3.50 per share to acquire all the outstanding...
-
Practice Lab Activity: Solve using SQL Queries: 1. Create a college database. 2. Create a student table with the following fields using SQ1 Query. StudentID, StudentName, Gender, Marks, CourseID. 3....
-
Repeat Exercise 16.6 using the t-test of the coefficient of correlation. Is this result identical to the one you produced in Exercise 16.6?
-
Refer to the information from Problem 6.46. Information from Problem 6.46 Physical Units Beginning WIP (25% complete) ...................................................11,000 Started during January...
-
CICA Handbook Section 5135, "The Auditor's Responsibility to Consider Fraud," requires auditors to plan and perform an audit to obtain reasonable assurance about whether the financial statements are...
-
What factors need to be considered when setting a selling price?
-
What is the difference between a firm's gross profit margin, operating profit margin, and net profit margin?
-
Tsingtao Companys balance sheet shows a stockholders equity book value (total common equity) of \($800,500.\) The firms earnings per share were \($3.50,\) resulting in a price/earnings ratio of...
-
The balance sheet and income statement for the Papua New Guinea Coconut Company are as follows: Income Statement ($000) Balance Sheet (5000) Cash Accounts receivable Inventories $ 550 2,500 1.100...
Legal And Regulatory Issues In Human Resources Management 1st Edition - ISBN: 1623968410 - Free Book
Study smarter with the SolutionInn App