Question: E-Commerce Analytics Project 2 DESCRIPTION Use Spark features for data analysis to derive valuable insights. Problem Statement: You are working as a Big Data consultant

E-Commerce Analytics

Project 2

DESCRIPTION

Use Spark features for data analysis to derive valuable insights.

Problem Statement: You are working as a Big Data consultant for an E-commerce company. Your role is to analyze sales data. The company has multiple stores across the globe. They want you to do the analytics of their sales transaction data. You need to provide valuable insights to understand their sales across cities, state on a daily and weekly basis. Also, provide various other insights regarding the review of the products.

Domain: E-Commerce

Analysis to be done: Exploratory analysis, to determine actionable insights.

Dataset File: olist_public_dataset.csv

Content:

  1. Id

  2. order_status

  3. order_products_value

  4. order_freight_value

  5. order_items_qty

  6. order_purchase_timestamp

  7. order_aproved_at

  8. order_delivered_customer_date

  9. customer_city

  10. customer_state

  11. customer_zip_code_prefix

  12. product_name_lenght

  13. product_description_lenght

  14. product_photos_qty

  15. review_score

Insights on Historical Data

  1. Daily Insights

    1. SALES

      • Total sales.

      • Total Sales in each Customer City.

      • Total sales in each Customer State.

    2. ORDERS

      • Total number of orders sold.

      • City wise order distribution.

      • State wise order distribution.

      • Average Review score per Order.

      • Average Freight charges per order.

      • Average time taken to approve the orders. (Order Approved Order Purchased).

      • Average order delivery time.

  2. Weekly Insights

    1. SALES

      • Total sales.

      • Total Sales in each Customer City.

      • Total sales in each Customer State.

    2. ORDERS

      • Total number of orders sold.

      • City wise order distribution.

      • State wise order distribution.

      • Average Review score per Order.

      • Average Freight charges per order.

      • Average time taken to approve the orders. (Order Approved Order Purchased).

      • Average order delivery time.

    3. Total Freight charges.

    4. Freight charges distribution in each Customer City

Approach

Tasks to perform:

Week 1: Approach Overview and Basic Configurations

  1. Install maven (3.6.2).

  2. Set environment variable of Maven

a) Check if maven is setup properly using mvn -version

  1. Install Java 1.8 and Scala 2.11.7

  2. Use Intellij to validate or modify source code

  3. Click mvn clean install to build jar file

  4. Use README.md for details instructions and helper commands

Week 2: Data Ingestion

  1. Upload the entire data into Hive from CSV

  2. Copy the data from Hive into HDFS

  3. Check the data in HDFS path

Week 3 : Data Streaming

  1. Create sample Maven Scala Project

  2. Add necessary spark dependencies

  3. Create Schema of CSV files

  4. Create Spark Session

a) Add S3 details

b) Add all variables to your environment as they have sensitive data

  1. Read CSV file and convert into dataset

  2. Create Map of City and Country

  3. Convert Date to Hour, Month, Year, Daily, and Day Bucket using UDF

  4. Iterate through all metrics for each column

  5. For each type of segment, calculate stats of different cities. Stats include max, min, average, and total records

Week 4 : Data Analysis and Visualization

  1. Write the results into the HDFS

  2. Save final dataset into Amazon S3

  3. Create Amazon Document DB Cluster

  4. Save insights in Document DB and provide APIs to view aggregate data

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!