Question: Segmenting the market based on shopping patterns using k - means 1 . 0 Description: Students will apply unsupervised learning techniques to segment the market
Segmenting the market based on shopping patterns using kmeans
Description:
Students will apply unsupervised learning techniques to segment the market based on
customer shopping habits using kmeans algorithm. Students will use a file named
sales.csv The file contains the sales details of a variety of tops from several retail
clothing stores, where each datapoint is defined by attributes. To complete this
problem students can refer to the section of kmeans algorithm and the last section on
chapter of the textbook and the handout.
Algorithm The main steps:
Import the required packages from csv numby, matplotlib.pyplot, sklearn.cluster
kmeans sklearn metrics
Load the data from the input file salescsv Since it's a CSV file, you can use the csv reader
in Python to read the data from this file and convert it into a NumPy array X
Define the number of clusters numclusters before applying the KMeans algorithm.
Here, you can assume that data lies on different clusters.
Create the KMeans object kmeans using the initialization parameters. Be carful to use the
right parameters.
Train the kmeans model with the input data.
Extract and print the centers of the clusters clustercenters
Since you are dealing with sixdimensional data and in order to visualize the data, you can
take a twodimensional data formed using second and third dimensions. So create two
subsets from data and centers of only second and third attributes Xd &
clustercentersd
To visualize the boundaries, you need to create a grid of points and evaluate the model on
all those points. So you can start by using the step size of stepsize
Define the grid of points and ensure that you are covering all the values in the input data.
Predict the outputs for all the points on the grid using the trained KMeans model.
Plot all output values and color each region.
Overlay input data points on top of these colored regions.
Plot the centers of the clusters obtained using the KMeans algorithm.
Sample Output:
To see sample output for same related example, students can refer to the section of the k
means algorithm and the last section of the same title on chapter of the textbook and the
handout.
Implementation to be completed:
# Import the required packages
# Load the data from the input file and convert it to numpy array
inputfile 'sales.csv
filereader csvreaderopeninputfile, r delimiter
X
for count, row in enumeratefilereader:
if not count:
names row:
continue
Xappendfloatx for x in row:
X nparrayX
# Define the number of clusters, create the KMeans object kmeans using the initialization
# parameters, and train the kmeans model with the input data.
numclusters
# Extract and print the centers of the clusters.
# Create two subsets from data and centers of only second and third attributes.
Xd X ::
clustercentersd clustercenters ::
# Step size of the mesh
stepsize
# Define the grid of points using the subdataset Xd
# Predict the outputs for all the points on the grid using the trained kmeans model.
# Plot all output values and color each region.
output output.reshapexvals.shape
pltfigure
pltclf
pltimshowoutput interpolation'nearest', extentxvals.min xvals.max
yvals.min yvals.max cmappltcmPaired, aspect'auto', origin'lower'
# Overlay input data points on top of these colored regions.
# Plot the centers of the clusters obtained using the KMeans algorithm.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
