Question: Implementation of a system for sales data analytics using Cassandra NoSQL database Overview & background: An e - Commerce company in Europe is formulating sales

Implementation of a system for sales data analytics using Cassandra NoSQL database
Overview & background:
An e-Commerce company in Europe is formulating sales strategy for the year 2011 based on the analysis of the sales data of 2009-2010. The name of the sales data file is "Assignment1_online_retail_dataset.csv". This file contains 525454 records in which the first record is header. You need to use Cassandra NoSQL for the analysis. The column family design, cqlsh command used to create the columnfamily, command used to copy the data from the file into the columnfamily, queries used for analysis and the results of the analysis queries developed by you should be submitted.
Description:
1. METADATA
File name is Assignment1_online_retail_dataset.csv. The schema of the data set in this file is (RecordNo, Invoice, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, Country). Some of the fields in some records may be blank. The analysis should be carried out by loading the data into Cassandra NoSQL database. The first record contains the schema definition and appropriate option should be set in the COPY command to skip the header record. No other modifications are allowed on the contents of the file. Negative values for quantity and/or unitprice indicate returned items.
2. Column Family (Table) design
You need to design a suitable column family structure for maximum query performance. For reference, a whitepaper titled Data-Modeling-in-Apache-Cassandra.pdf is included. Select the partition key properly. While developing the queries:
1. You are allowed to create only one columnfamily and use the same columnfamily for all queries.
2. You may use allow filtering in queries if required.
3. You may create materialzed views from the table if required to execute some queries.
2
3. ANALYTICS:
Find answers for the following questions by developing appropriate analytical queries in Cassandra query language (cqlsh):
1. Find total number of transactions in the given transaction file.
2. Find total value of sale happened in the year 2009-2010.
(Sale Price = Quantity*UnitPrice).
3. Find total value of sale happened in USA.
4. List of countries from which purchases were made on the online shop.
5. Find country from which sale with the maximum value happened.
6. Find the countries from which the quantity in one sale of an item was
more than 8000
Sample data
RecordNo Invoice StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country 14894348504815CM CHRISTMAS GLASS BALL 20 LIGHTS 1212/1/20097:456.9513085 United Kingdom 248943479323P PINK CHERRY LIGHTS 1212/1/20097:456.7513085 United Kingdom 348943479323W WHITE CHERRY LIGHTS 1212/1/20097:456.7513085 United Kingdom 448943422041 RECORD FRAME 7" SINGLE SIZE4812/1/20097:452.113085 United Kingdom 548943421232 STRAWBERRY CERAMIC TRINKET BOX 2412/1/20097:451.2513085 United Kingdom 648943422064 PINK DOUGHNUT TRINKET POT2412/1/20097:451.6513085 United Kingdom 748943421871 SAVE THE PLANET MUG 2412/1/20097:451.2513085 United Kingdom 848943421523 FANCY FONT HOME SWEET HOME DOORMAT 1012/1/20097:455.9513085 United Kingdom 948943522350 CAT BOWL1212/1/20097:462.5513085 United Kingdom 1048943522349 DOG BOWL , CHASING BALL DESIGN 1212/1/20097:463.7513085 United Kingdom 1148943522195 HEART MEASURING SPOONS LARGE 2412/1/20097:461.6513085 United Kingdom

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!