Effective 2D Image Compression with K-means Clustering

  • The internet is filled with huge amounts of data in the form of images. With such large amounts of data, image compression techniques become important to compress the images and reduce storage space.
  • In this post, we will implement and test the highly effective and simple 2D image compression algorithm developed by Jordi Warmenhoven.
  • It is based upon K-means clustering – one of the simplest and popular unsupervised Machine Learning (ML) algorithms, which groups the unlabeled dataset into different clusters.
  • Here K defines the number of predefined clusters that need to be created in the process, as if K=2, there will be two clusters.
  • In image compression, K represents the number of colors.
  • The K-means algorithm allows us to cluster the 2D image into different segments and a convenient way to discover the categories of segments in the unlabeled dataset on its own without the need for any training.
  • It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

Performance Test

Let’s set the working directory YOUR PATH and import the key Python libraries

import os
os.chdir(‘YOUR PATH’)
os. getcwd()

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

from scipy.io import loadmat
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from scipy import linalg

pd.set_option(‘display.notebook_repr_html’, False)
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.max_rows’, 150)
pd.set_option(‘display.max_seq_items’, None)

%matplotlib inline

import seaborn as sns
sns.set_context(‘notebook’)
sns.set_style(‘white’)

Let’s load the test synthetic data

data1 = loadmat(‘ex7data2.mat’)

X1 = data1[‘X’]
print(‘X1:’, X1.shape)

X1: (300, 2)

Let’s add uncorrelated noise with signal/noise=0.1

x0mean=X1[:,0].mean()
x1mean=X1[:,1].mean()
x0std=X1[:,0].std()
x1std=X1[:,1].std()
dim=300
scale=0.1
noise0 = np.random.normal(x0mean,x0std,dim)
noise1 = np.random.normal(x1mean,x1std,dim)
X1[:,0]=X1[:,0]+noise0/scale
X1[:,1]=X1[:,1]+noise1/scale

Let’s call Kmeans with K=3

km1 = KMeans(3)
km1.fit(X1)

KMeans(n_clusters=3)

Let’s plot the output

plt.scatter(X1[:,0], X1[:,1], s=40, c=km1.labels_, cmap=plt.cm.prism)
plt.title(‘K-Means Clustering Results with K=3’)
plt.scatter(km1.cluster_centers_[:,0], km1.cluster_centers_[:,1], marker=’+’, s=100, c=’k’, linewidth=2);

K-means clustering results with K=3

Image Compression

Let’s load the image

img = plt.imread(‘youtubewatcher.png’)
img_shape = img.shape
img_shape

(1440, 2560, 4)

and perform the following transformations

A = img/255

AA = A.reshape(img_shape[0]*img_shape[1], img_shape[2])
AA.shape

(3686400, 4)

Let’s apply K-means with K=64

km2 = KMeans(64)
km2.fit(AA)

KMeans(n_clusters=64)

B = km2.cluster_centers_[km2.labels_].reshape(img_shape[0], img_shape[1], img_shape[2])

Let’s plot the outcome

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(13,9))
ax1.imshow(img)
ax1.set_title(‘Original’)
ax2.imshow(B*255)
ax2.set_title(‘Compressed, with 64 colors’)

for ax in fig.axes:
ax.axis(‘off’)

Image "YouTube Video Watchers": original vs compressed, with 64 colors.

Let’s load another image

img = plt.imread(‘bird_small.png’)

and repeat the same sequence as above.

The outcome is

Image "Parrot": original vs compressed, with 64 colors.

Summary

  • We have looked at image compression using the K-means clustering algorithm which is an unsupervised ML algorithm. 
  • In this case study, the optimal image compression was performed with K=64.
  • For the YouTube image, the compression ratio is 5,712/131 = 43,60.
  • For the Parrot image, the compression ratio is 33/4 = 8,25.
  • Results show that the K-means algorithm works well and can be used to compress 2D images without compromising on quality/resolution.

Explore More

ML/AI Breast Cancer Diagnosis with 98% Confidence

K-means clustering algorithm (unsupervised learning) for image compression

Image Compression with K-means Clustering


Go back

Your message has been sent

Warning

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

€5.00
€15.00
€100.00
€5.00
€15.00
€100.00
€5.00
€15.00
€100.00

Or enter a custom amount


Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

Discover more from Our Blogs

Subscribe to get the latest posts sent to your email.

Leave a comment

Discover more from Our Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading