E Commerce Analytics Project 2 DESCRIPTION Use Spark features for data analysis to derive valuable insights Problem Statement You are working as a Big Data consultant for an E commerce company Your role is to analyze sales data The company has multiple stores across the globe They want you to do the analytics of their sales transaction data You need to provide valuable insights to understand their sales across cities, state on a daily and weekly basis Also, provide various other insights regarding the review of the products Domain E Commerce Analysis to be done Exploratory analysis, to determine actionable insights Dataset File olist public dataset csv Content Id order status order products value order freight value order items qty order purchase timestamp order aproved at order delivered customer date customer city customer state customer zip code prefix product name lenght product description lenght product photos qty review score Insights on Historical Data Daily Insights SALES Total sales Total Sales in each Customer City Total sales in each Customer State ORDERS Total number of orders sold City wise order distribution State wise order distribution Average Review score per Order Average Freight charges per order Average time taken to approve the orders (Order Approved Order Purchased) Average order delivery time Weekly Insights SALES Total sales Total Sales in each Customer City Total sales in each Customer State ORDERS Total number of orders sold City wise order distribution State wise order distribution Average Review score per Order Average Freight charges per order Average time taken to approve the orders (Order Approved Order Purchased) Average order delivery time Total Freight charges Freight charges distribution in each Customer City Approach Tasks to perform Week 1 Approach Overview and Basic Configurations Install maven (3 6 2) Set environment variable of Maven a) Check if maven is setup properly using mvn version Install Java 1 8 and Scala 2 11 7 Use Intellij to validate or modify source code Click mvn clean install to build jar file Use README md for details instructions and helper commands Week 2 Data Ingestion Upload the entire data into Hive from CSV Copy the data from Hive into HDFS Check the data in HDFS path Week 3 Data Streaming Create sample Maven Scala Project Add necessary spark dependencies Create Schema of CSV files Create Spark Session a) Add S3 details b) Add all variables to your environment as they have sensitive data Read CSV file and convert into dataset Create Map of City and Country Convert Date to Hour, Month, Year, Daily, and Day Bucket using UDF Iterate through all metrics for each column For each type of segment, calculate stats of different cities Stats include max, min, average, and total records Week 4 Data Analysis and Visualization Write the results into the HDFS Save final dataset into Amazon S3 Create Amazon Document DB Cluster Save insights in Document DB and provide APIs to view aggregate data

The Answer is in the image, click to view ...

Question: E-Commerce Analytics Project 2 DESCRIPTION Use Spark features for data analysis to derive valuable insights. Problem Statement: You are working as a Big Data consultant

E-Commerce Analytics

Project 2

DESCRIPTION

Use Spark features for data analysis to derive valuable insights.

Problem Statement: You are working as a Big Data consultant for an E-commerce company. Your role is to analyze sales data. The company has multiple stores across the globe. They want you to do the analytics of their sales transaction data. You need to provide valuable insights to understand their sales across cities, state on a daily and weekly basis. Also, provide various other insights regarding the review of the products.

Domain: E-Commerce

Analysis to be done: Exploratory analysis, to determine actionable insights.

Dataset File: olist_public_dataset.csv

Content:

Id
order_status
order_products_value
order_freight_value
order_items_qty
order_purchase_timestamp
order_aproved_at
order_delivered_customer_date
customer_city
customer_state
customer_zip_code_prefix
product_name_lenght
product_description_lenght
product_photos_qty
review_score

Insights on Historical Data

Daily Insights
1. SALES
  - Total sales.
  - Total Sales in each Customer City.
  - Total sales in each Customer State.
2. ORDERS
  - Total number of orders sold.
  - City wise order distribution.
  - State wise order distribution.
  - Average Review score per Order.
  - Average Freight charges per order.
  - Average time taken to approve the orders. (Order Approved Order Purchased).
  - Average order delivery time.
Weekly Insights
1. SALES
  - Total sales.
  - Total Sales in each Customer City.
  - Total sales in each Customer State.
2. ORDERS
  - Total number of orders sold.
  - City wise order distribution.
  - State wise order distribution.
  - Average Review score per Order.
  - Average Freight charges per order.
  - Average time taken to approve the orders. (Order Approved Order Purchased).
  - Average order delivery time.
3. Total Freight charges.
4. Freight charges distribution in each Customer City

Approach

Tasks to perform:

Week 1: Approach Overview and Basic Configurations

Install maven (3.6.2).
Set environment variable of Maven

a) Check if maven is setup properly using mvn -version

Install Java 1.8 and Scala 2.11.7
Use Intellij to validate or modify source code
Click mvn clean install to build jar file
Use README.md for details instructions and helper commands

Week 2: Data Ingestion

Upload the entire data into Hive from CSV
Copy the data from Hive into HDFS
Check the data in HDFS path

Week 3 : Data Streaming

Create sample Maven Scala Project
Add necessary spark dependencies
Create Schema of CSV files
Create Spark Session

a) Add S3 details

b) Add all variables to your environment as they have sensitive data

Read CSV file and convert into dataset
Create Map of City and Country
Convert Date to Hour, Month, Year, Daily, and Day Bucket using UDF
Iterate through all metrics for each column
For each type of segment, calculate stats of different cities. Stats include max, min, average, and total records

Week 4 : Data Analysis and Visualization

Write the results into the HDFS
Save final dataset into Amazon S3
Create Amazon Document DB Cluster
Save insights in Document DB and provide APIs to view aggregate data

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

E-Commerce Analytics Project 2 DESCRIPTION Use Spark features for data analysis to derive valuable insights. Problem Statement: You are working as a Big Data consultant for an E-commerce company....

DESCRIPTION Use Spark features for data analysis to derive valuable insights Problem Statement: You are working as a Big Data consultant for an E-commerce company. Your role is to analyze sales data....

Use Spark features for data analysis to derive valuable insights. Problem Statement: You are working as a Big Data consultant for an E-commerce company. Your role is to analyze sales data. The...

DESCRIPTION To use Spark features for data analysis and showing the valuable insights. Problem Statement: You are working as a Big Data engineer in an insurance company. Your job is to analyze road...

Weather Analytics Project 3 DESCRIPTION To use Spark features for data analysis and showing the valuable insights. Problem Statement: You are working as a Big Data engineer in an insurance company....

O LLERELLY R LR LR L e - Perform the Analysis o What is the balance of inventory on hand? What are the descriptive performance statistics? What were the Eeturn on Asset, Asset Turnover, Eeturn on...

Consider answering these questions: What does this article have to say about leadership in general? More specifically, what does it have to say about the nature of teams and the leadership of teams?...

Write a journal, read the article that came out in the December 2014 issue of the Harvard Business Review , there is an article titled "Leading Your Team into the Unknown." This article is about how...

Good communication is just as stimulating as black coffee and just as hard to sleep after. - Anne Morrow Lindbergh In May 2021, David Black, CEO of Blackbox, ended his Zoom call with a sense of...

What do you understand by phase diagrams? How these can be useful to predict the percentage composition of alloying component at some instant using lever rule?

What factors played a role in bringing about the great depression?

Allie Company purchased machinery on January 1 at a list price of $ 2 0 0 , 0 0 0 , with credit terms 2 / 1 0 , n / 3 0 . Payment was made within the discount period. Allie paid $ 1 0 , 0 0 0 sales...

Present an opinion and suggestion on how the requirements of the FASB Standard No. 141R and 164 could be improved for better and more reliable consolidated financial statements.

6. Do you believe that differences of opinion are helpful and beneficial? __ always __ usually __ occasionally __ seldom __ never true

3. Do you favor cooperation with all others in your everyday activities and disfavor competition with them? __ always __ usually __ occasionally __ seldom __ never true

1. What happens to conflicts as relationships become closer, more personal, and more interdependent?