Effective 2D Image Compression with K-means Clustering

  • The internet is filled with huge amounts of data in the form of images. With such large amounts of data, image compression techniques become important to compress the images and reduce storage space.
  • In this post, we will implement and test the highly effective and simple 2D image compression algorithm developed by Jordi Warmenhoven.
  • It is based upon K-means clustering – one of the simplest and popular unsupervised Machine Learning (ML) algorithms, which groups the unlabeled dataset into different clusters.
  • Here K defines the number of predefined clusters that need to be created in the process, as if K=2, there will be two clusters.
  • In image compression, K represents the number of colors.
  • The K-means algorithm allows us to cluster the 2D image into different segments and a convenient way to discover the categories of segments in the unlabeled dataset on its own without the need for any training.
  • It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

Performance Test

Let’s set the working directory YOUR PATH and import the key Python libraries

import os
os.chdir(‘YOUR PATH’)
os. getcwd()

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

from scipy.io import loadmat
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from scipy import linalg

pd.set_option(‘display.notebook_repr_html’, False)
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.max_rows’, 150)
pd.set_option(‘display.max_seq_items’, None)

%matplotlib inline

import seaborn as sns

Let’s load the test synthetic data

data1 = loadmat(‘ex7data2.mat’)

X1 = data1[‘X’]
print(‘X1:’, X1.shape)

X1: (300, 2)

Let’s add uncorrelated noise with signal/noise=0.1

noise0 = np.random.normal(x0mean,x0std,dim)
noise1 = np.random.normal(x1mean,x1std,dim)

Let’s call Kmeans with K=3

km1 = KMeans(3)


Let’s plot the output

plt.scatter(X1[:,0], X1[:,1], s=40, c=km1.labels_, cmap=plt.cm.prism)
plt.title(‘K-Means Clustering Results with K=3’)
plt.scatter(km1.cluster_centers_[:,0], km1.cluster_centers_[:,1], marker=’+’, s=100, c=’k’, linewidth=2);

K-means clustering results with K=3

Image Compression

Let’s load the image

img = plt.imread(‘youtubewatcher.png’)
img_shape = img.shape

(1440, 2560, 4)

and perform the following transformations

A = img/255

AA = A.reshape(img_shape[0]*img_shape[1], img_shape[2])

(3686400, 4)

Let’s apply K-means with K=64

km2 = KMeans(64)


B = km2.cluster_centers_[km2.labels_].reshape(img_shape[0], img_shape[1], img_shape[2])

Let’s plot the outcome

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(13,9))
ax2.set_title(‘Compressed, with 64 colors’)

for ax in fig.axes:

Image "YouTube Video Watchers": original vs compressed, with 64 colors.

Let’s load another image

img = plt.imread(‘bird_small.png’)

and repeat the same sequence as above.

The outcome is

Image "Parrot": original vs compressed, with 64 colors.


  • We have looked at image compression using the K-means clustering algorithm which is an unsupervised ML algorithm. 
  • In this case study, the optimal image compression was performed with K=64.
  • For the YouTube image, the compression ratio is 5,712/131 = 43,60.
  • For the Parrot image, the compression ratio is 33/4 = 8,25.
  • Results show that the K-means algorithm works well and can be used to compress 2D images without compromising on quality/resolution.

Explore More

ML/AI Breast Cancer Diagnosis with 98% Confidence

K-means clustering algorithm (unsupervised learning) for image compression

Image Compression with K-means Clustering


Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount


Or enter a custom amount


Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

One response to “Effective 2D Image Compression with K-means Clustering”

  1. Love This !! my thoughts on this ….
    The internet has a vast amount of image data that needs compression to reduce storage space. To achieve this, image compression techniques are important, and one such effective and simple algorithm is the 2D image compression algorithm developed by Jordi Warmenhoven. The algorithm is based on K-means clustering, which is a popular unsupervised Machine Learning algorithm that groups an unlabeled dataset into clusters. In the case of image compression, K means the number of colors used in the image. The K-means algorithm clusters the 2D image into different segments, reducing the number of colors in each segment, resulting in a compressed image. The algorithm is implemented by first converting the image to a 2D array, and then applying the K-means clustering algorithm to the array. The compressed image is obtained by applying the reduced color palette to the original image. The algorithm is tested on various images, and the results show that it is highly effective and simple.
    Thanks – PomKing


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: