K-means clustering is a form of unsupervised machine learning. It can be used to separate unlabeled data into groups. A marketing team could use K-means on customer data to perform customer segmentation. However, in this example I use K-means clustering to perform image compression: I group pixels in an image by their similarity in color in order to reduce the total number of colors within that image.
You can download my code at github to follow along. This guide is written with Python in mind, but I also provide a brief R walkthrough in the final section of this post.
I will be performing image compression on this image:
First, I load all relevant packages:
from skimage import io
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as image
img = io.imread('bird_small.png')
img_r = (img / 255.0).reshape(-1,3)
I reshape the image into a 16384x3 array so that each row represents a pixel and the three columns represent the Red, Green, and Blue values.
#Fit K-means on resized image. n_clusters is the desired number of colors
k_colors = KMeans(n_clusters=128).fit(img_r)
#Assign colors to pixels based on their cluster center
#Each row in k_colors.cluster_centers_ represents the RGB value of a cluster centroid
#k_colors.labels_ contains the cluster that a pixel is assigned to
#The following assigns every pixel the color of the centroid it is assigned to
img128=k_colors.cluster_centers_[k_colors.labels_]
#Reshape the image back to 128x128x3 to save
img128=np.reshape(img128, (img.shape))
#Save image
image.imsave('img128.png',img128)
The following is a shiny web app I created in R to visualize how the image changes as the number of clusters fed to the K-means algorithm changes:
The following code performs the same K-means image compression in R. It only differs from the Python code in syntax:
library(png)
library(reshape)
library(stats)
img <- readPNG("bird_small.png")
origdim <- dim(img)
dim(img) <- c(dim(img)[1]*dim(img)[2],3)
k_colors = kmeans(img,centers=128,iter.max=100)
img128 <- k_colors$centers[k_colors$cluster,]
dim(img128) <- origdim
writePNG(img128,"img128.png")