Question: Segmenting the market based on shopping patterns using k - means 1 . 0 Description: Students will apply unsupervised learning techniques to segment the market

Segmenting the market based on shopping patterns using k-means
1.0 Description:
Students will apply unsupervised learning techniques to segment the market based on
customer shopping habits using k-means algorithm. Students will use a file named
sales.csv. The file contains the sales details of a variety of tops from several retail
clothing stores, where each datapoint is defined by 6 attributes. To complete this
problem students can refer to the section of k-means algorithm and the last section on
chapter 7 of the textbook and the handout.
2.0 Algorithm (The main steps):
1- Import the required packages from csv, numby, matplotlib.pyplot, sklearn.cluster
(kmeans), sklearn (metrics).
2- Load the data from the input file (sales.csv). Since it's a CSV file, you can use the csv reader
in Python to read the data from this file and convert it into a NumPy array (X).
3- Define the number of clusters (num_clusters) before applying the K-Means algorithm.
Here, you can assume that data lies on 9 different clusters.
4- Create the KMeans object (kmeans) using the initialization parameters. Be carful to use the
right parameters.
5- Train the kmeans model with the input data.
6- Extract and print the centers of the 9 clusters (cluster_centers).
7- Since you are dealing with six-dimensional data and in order to visualize the data, you can
take a two-dimensional data formed using second and third dimensions. So, create two
subsets from data and centers of only second and third attributes (X_2d &
cluster_centers_2d).
8- To visualize the boundaries, you need to create a grid of points and evaluate the model on
all those points. So, you can start by using the step size of 0.01(step_size =0.01).
9- Define the grid of points and ensure that you are covering all the values in the input data.
10- Predict the outputs for all the points on the grid using the trained K-Means model.
11- Plot all output values and color each region.
12- Overlay input data points on top of these colored regions.
13- Plot the centers of the clusters obtained using the K-Means algorithm.
3.0 Sample Output:
To see sample output for same related example, students can refer to the section of the k-
means algorithm and the last section of the same title on chapter 7 of the textbook and the
handout.
2
4.0 Implementation to be completed:
# Import the required packages
# Load the data from the input file and convert it to numpy array
input_file = 'sales.csv'
file_reader = csv.reader(open(input_file, 'r'), delimiter=',')
X =[]
for count, row in enumerate(file_reader):
if not count:
names = row[1:]
continue
X.append([float(x) for x in row[1:]])
X = np.array(X)
# Define the number of clusters, create the KMeans object (kmeans) using the initialization
# parameters, and train the kmeans model with the input data.
num_clusters =9
# Extract and print the centers of the 9 clusters.
# Create two subsets from data and centers of only second and third attributes.
X_2d = X [:,1:3]
cluster_centers_2d = cluster_centers [:,1:3]
# Step size of the mesh
step_size =0.01
# Define the grid of points using the sub-dataset X_2d.
# Predict the outputs for all the points on the grid using the trained kmeans model.
3
# Plot all output values and color each region.
output = output.reshape(x_vals.shape)
plt.figure()
plt.clf()
plt.imshow(output, interpolation='nearest', extent=(x_vals.min(), x_vals.max(),
y_vals.min(), y_vals.max()), cmap=plt.cm.Paired, aspect='auto', origin='lower')
# Overlay input data points on top of these colored regions.
# Plot the centers of the clusters obtained using the K-Means algorithm.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!