Question: from matplotlib.offsetbox import OffsetImage, AnnotationBbox np . random.seed ( 0 ) plt . figure ( figsize = ( 4 0 , 4 0 ) )
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
nprandom.seed
pltfigurefigsize
# Scatter plot to help with positioning images
pltscattertsneresults: tsneresults: alpha
# Loop over each image in the subset and plot at the corresponding tSNE position
for i in rangesubsetsize:
# Get the image corresponding to this tSNE point
image trainimagesireshape
# Create an OffsetImage object
imagebox OffsetImageimage zoom
# Create the annotation box with the image
ab AnnotationBboximageboxtsneresultsi tsneresultsi frameonFalse
# Add it to the plot
pltgcaaddartistab
# Title and labels
plttitletSNE Visualization with Images'
pltxlabeltSNE Component
pltylabeltSNE Component
# Show the plot
pltshowQ: Based on the last part, how many clusters do you think is good for kMeans? Why?
T: Set up KMeans with some rasonable k based on the previous part no worries: there is no single best answer
Tip: to speed things up import and use MiniBatchKMeans instead of KMeans. Browse the documentation to know how it differs.
#
T: Fit kMeans with your data.
#
Plot tSNE embedding using clusters' labels
Let's see to what extent the kMeans clusters resemble the structure of the tSNE output.
T: Plot the tSNE embedding again but this time assign colors corresponding to the kMeans cluster of each image.
Q: Can you see significant groups of points with the same color labelIf not, something is wrong. How many do you see, roughly?
#
T: Repeat the plot above but define the color of each point as the mean color of the images in the cluster to which the image belongs to
Hint: You should see some blue and orange parts. Also some almost white and quite dark parts? If yes good.
If you don't see them something is likely wrong. Maybe too few iterations? If everything is gray, something is very wrong maybe way too few clusters k Tune the parameters until happy. Do you see why we wanted to use the faster, approximate version of kmeans? Data analysis is often done iterativelyinteractively so efficient algorithms save your time.
#
If you're not satisfied with the quality you can tune the parameters some more.
T: If everything looks acceptable, rerun kMeans on the full dataset something which we couldn't realistically do with tSNE!
#
Let's veriify if the clusters we got on the entire dataset are reasonable.
T: For each cluster center, plot, say, images which are closest in the sense of the Euclidean metric to it
Q: Looks good? Or maybe you see sometihng suspicious?
For example: if any cluster center look like a single image in the dataset, you likely chose too many clusters!
#
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
