Effective 2D Image Compression with K-means Clustering

• The internet is filled with huge amounts of data in the form of images. With such large amounts of data, image compression techniques become important to compress the images and reduce storage space.
• In this post, we will implement and test the highly effective and simple 2D image compression algorithm developed by Jordi Warmenhoven.
• It is based upon K-means clustering – one of the simplest and popular unsupervised Machine Learning (ML) algorithms, which groups the unlabeled dataset into different clusters.
• Here K defines the number of predefined clusters that need to be created in the process, as if K=2, there will be two clusters.
• In image compression, K represents the number of colors.
• The K-means algorithm allows us to cluster the 2D image into different segments and a convenient way to discover the categories of segments in the unlabeled dataset on its own without the need for any training.
• It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

Performance Test

Let’s set the working directory YOUR PATH and import the key Python libraries

import os
os. getcwd()

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from scipy import linalg

pd.set_option(‘display.notebook_repr_html’, False)
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.max_rows’, 150)
pd.set_option(‘display.max_seq_items’, None)

%matplotlib inline

import seaborn as sns
sns.set_context(‘notebook’)
sns.set_style(‘white’)

Let’s load the test synthetic data

X1 = data1[‘X’]
print(‘X1:’, X1.shape)

`X1: (300, 2)`

Let’s add uncorrelated noise with signal/noise=0.1

x0mean=X1[:,0].mean()
x1mean=X1[:,1].mean()
x0std=X1[:,0].std()
x1std=X1[:,1].std()
dim=300
scale=0.1
noise0 = np.random.normal(x0mean,x0std,dim)
noise1 = np.random.normal(x1mean,x1std,dim)
X1[:,0]=X1[:,0]+noise0/scale
X1[:,1]=X1[:,1]+noise1/scale

Let’s call Kmeans with K=3

km1 = KMeans(3)
km1.fit(X1)

`KMeans(n_clusters=3)`

Let’s plot the output

plt.scatter(X1[:,0], X1[:,1], s=40, c=km1.labels_, cmap=plt.cm.prism)
plt.title(‘K-Means Clustering Results with K=3’)
plt.scatter(km1.cluster_centers_[:,0], km1.cluster_centers_[:,1], marker=’+’, s=100, c=’k’, linewidth=2);

Image Compression

img_shape = img.shape
img_shape

`(1440, 2560, 4)`

and perform the following transformations

A = img/255

AA = A.reshape(img_shape[0]*img_shape[1], img_shape[2])
AA.shape

`(3686400, 4)`

Let’s apply K-means with K=64

km2 = KMeans(64)
km2.fit(AA)

`KMeans(n_clusters=64)`

B = km2.cluster_centers_[km2.labels_].reshape(img_shape[0], img_shape[1], img_shape[2])

Let’s plot the outcome

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(13,9))
ax1.imshow(img)
ax1.set_title(‘Original’)
ax2.imshow(B*255)
ax2.set_title(‘Compressed, with 64 colors’)

for ax in fig.axes:
ax.axis(‘off’)

and repeat the same sequence as above.

The outcome is

Summary

• We have looked at image compression using the K-means clustering algorithm which is an unsupervised ML algorithm.
• In this case study, the optimal image compression was performed with K=64.
• For the YouTube image, the compression ratio is 5,712/131 = 43,60.
• For the Parrot image, the compression ratio is 33/4 = 8,25.
• Results show that the K-means algorithm works well and can be used to compress 2D images without compromising on quality/resolution.

Explore More

ML/AI Breast Cancer Diagnosis with 98% Confidence

K-means clustering algorithm (unsupervised learning) for image compression

Image Compression with K-means Clustering

One-Time
Monthly
Yearly

Make a yearly donation

Choose an amount

\$5.00
\$15.00
\$100.00
\$5.00
\$15.00
\$100.00
\$5.00
\$15.00
\$100.00

Or enter a custom amount

\$