- The internet is filled with huge amounts of data in the form of images. With such large amounts of data, image compression techniques become important to compress the images and reduce storage space.
- In this post, we will implement and test the highly effective and simple 2D image compression algorithm developed by Jordi Warmenhoven.
- It is based upon K-means clustering – one of the simplest and popular unsupervised Machine Learning (ML) algorithms, which groups the unlabeled dataset into different clusters.
- Here K defines the number of predefined clusters that need to be created in the process, as if K=2, there will be two clusters.
- In image compression, K represents the number of colors.
- The K-means algorithm allows us to cluster the 2D image into different segments and a convenient way to discover the categories of segments in the unlabeled dataset on its own without the need for any training.
- It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
Performance Test
Let’s set the working directory YOUR PATH and import the key Python libraries
import os
os.chdir(‘YOUR PATH’)
os. getcwd()
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.io import loadmat
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from scipy import linalg
pd.set_option(‘display.notebook_repr_html’, False)
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.max_rows’, 150)
pd.set_option(‘display.max_seq_items’, None)
%matplotlib inline
import seaborn as sns
sns.set_context(‘notebook’)
sns.set_style(‘white’)
Let’s load the test synthetic data
data1 = loadmat(‘ex7data2.mat’)
X1 = data1[‘X’]
print(‘X1:’, X1.shape)
X1: (300, 2)
Let’s add uncorrelated noise with signal/noise=0.1
x0mean=X1[:,0].mean()
x1mean=X1[:,1].mean()
x0std=X1[:,0].std()
x1std=X1[:,1].std()
dim=300
scale=0.1
noise0 = np.random.normal(x0mean,x0std,dim)
noise1 = np.random.normal(x1mean,x1std,dim)
X1[:,0]=X1[:,0]+noise0/scale
X1[:,1]=X1[:,1]+noise1/scale
Let’s call Kmeans with K=3
km1 = KMeans(3)
km1.fit(X1)
KMeans(n_clusters=3)
Let’s plot the output
plt.scatter(X1[:,0], X1[:,1], s=40, c=km1.labels_, cmap=plt.cm.prism)
plt.title(‘K-Means Clustering Results with K=3’)
plt.scatter(km1.cluster_centers_[:,0], km1.cluster_centers_[:,1], marker=’+’, s=100, c=’k’, linewidth=2);

Image Compression
Let’s load the image
img = plt.imread(‘youtubewatcher.png’)
img_shape = img.shape
img_shape
(1440, 2560, 4)
and perform the following transformations
A = img/255
AA = A.reshape(img_shape[0]*img_shape[1], img_shape[2])
AA.shape
(3686400, 4)
Let’s apply K-means with K=64
km2 = KMeans(64)
km2.fit(AA)
KMeans(n_clusters=64)
B = km2.cluster_centers_[km2.labels_].reshape(img_shape[0], img_shape[1], img_shape[2])
Let’s plot the outcome
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(13,9))
ax1.imshow(img)
ax1.set_title(‘Original’)
ax2.imshow(B*255)
ax2.set_title(‘Compressed, with 64 colors’)
for ax in fig.axes:
ax.axis(‘off’)

Let’s load another image
img = plt.imread(‘bird_small.png’)
and repeat the same sequence as above.
The outcome is

Summary
- We have looked at image compression using the K-means clustering algorithm which is an unsupervised ML algorithm.
- In this case study, the optimal image compression was performed with K=64.
- For the YouTube image, the compression ratio is 5,712/131 = 43,60.
- For the Parrot image, the compression ratio is 33/4 = 8,25.
- Results show that the K-means algorithm works well and can be used to compress 2D images without compromising on quality/resolution.
Explore More
ML/AI Breast Cancer Diagnosis with 98% Confidence
K-means clustering algorithm (unsupervised learning) for image compression
Image Compression with K-means Clustering
Make a one-time donation
Make a monthly donation
Make a yearly donation
Choose an amount
Or enter a custom amount
Your contribution is appreciated.
Your contribution is appreciated.
Your contribution is appreciated.
DonateDonate monthlyDonate yearly
One response to “Effective 2D Image Compression with K-means Clustering”
Love This !! my thoughts on this ….
The internet has a vast amount of image data that needs compression to reduce storage space. To achieve this, image compression techniques are important, and one such effective and simple algorithm is the 2D image compression algorithm developed by Jordi Warmenhoven. The algorithm is based on K-means clustering, which is a popular unsupervised Machine Learning algorithm that groups an unlabeled dataset into clusters. In the case of image compression, K means the number of colors used in the image. The K-means algorithm clusters the 2D image into different segments, reducing the number of colors in each segment, resulting in a compressed image. The algorithm is implemented by first converting the image to a 2D array, and then applying the K-means clustering algorithm to the array. The compressed image is obtained by applying the reduced color palette to the original image. The algorithm is tested on various images, and the results show that it is highly effective and simple.
Thanks – PomKing
http://www.pomeranianpuppies.uk
LikeLike