Comparing 4 Python Libraries for Interactive COVID-19 Data Science Visualization

Featured Photo by Polina Zimmerman on Pexels.

Data Visualization (DV) is the first step towards getting an insight into a large data set in every data science project. DV tools available in Python can be a very effective and efficient way of finding trends, outliers, and hidden patterns in data.

Following the recent DV study and related pilots, our objective is to gain insights that help to contain the coronavirus through charts derived from the COVID-19 dataset. The file contains the cumulative count of confirmed, death and recovered cases of COVID-19 from different countries from 22nd January 2020.

Today we will be comparing the following 4 open-source DV libraries: pyPlot, Plotly, Bokeh, and Vega-Altair.

  • pyPlot is a collection of functions that make matplotlib work like MATLAB. This is a comprehensive library for creating static, animated, and interactive visualizations in Python.
  • Plotly library makes interactive, publication-quality graphs such as line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.
  • Bokeh is a Python library for creating interactive visualizations for modern web browsers. It helps you build beautiful graphics, ranging from simple plots to complex dashboards with streaming datasets. With Bokeh, you can create JavaScript-powered visualizations without writing any JavaScript yourself.
  • VegaAltair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub. With Vega-Altair, you can spend more time understanding your data and its meaning. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. This elegant simplicity produces beautiful and effective visualizations with a minimal amount of code.

Key Questions:

  • How does the Global Spread of the virus look like?
  • How intensive the spread of the virus has been in the countries? 
  • Does covid19 national lockdowns and self-isolations in different countries have actually impact on COVID19 transmission? 

Libraries

Let’s set the working directory YOURPATH

import os
os.chdir(‘YOURPATH’) # Set working directory
os. getcwd()

Let’s install datapane

!pip install datapane

and import other libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option(“display.max_columns”,None)
pd.set_option(“display.max_rows”,None)
import warnings
warnings.filterwarnings(“ignore”)
from IPython.display import Image
sns.set(style=”darkgrid”, palette=”pastel”, color_codes=True)
sns.set_context(“paper”)
from datetime import datetime
import datapane as dp

Altair imports

import altair as alt
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

Dataset

Let’s read the input data

df = pd.read_csv(‘covid_19_clean_complete.csv’)
clist = pd.read_csv(‘all_countries.csv’)

df.head()

Covid-19 input data table

df.shape

(49068, 11)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49068 entries, 0 to 49067
Data columns (total 11 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   state       14664 non-null  object        
 1   country     49068 non-null  object        
 2   lat         49068 non-null  float64       
 3   long        49068 non-null  float64       
 4   date        49068 non-null  datetime64[ns]
 5   confirmed   49068 non-null  int64         
 6   deaths      49068 non-null  int64         
 7   recovered   49068 non-null  int64         
 8   Active      49068 non-null  int64         
 9   WHO Region  49068 non-null  object        
 10  active      49068 non-null  int64         
dtypes: datetime64[ns](1), float64(2), int64(5), object(3)
memory usage: 4.1+ MB

clist.head()

The full country list

clist.shape

(192, 2)

clist.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 192 entries, 0 to 191
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   0            192 non-null    int64 
 1   Afghanistan  192 non-null    object
dtypes: int64(1), object(1)
memory usage: 3.1+ KB

let’s rename the columns for the sake of convenience
df.rename(columns={‘Date’: ‘date’,
‘Province/State’:’state’,
‘Country/Region’:’country’,
‘Lat’:’lat’, ‘Long’:’long’,
‘Confirmed’: ‘confirmed’,
‘Deaths’:’deaths’,
‘Recovered’:’recovered’
}, inplace=True)

and calculate the difference

Active Case = confirmed – deaths – recovered
df[‘active’] = df[‘confirmed’] – df[‘deaths’] – df[‘recovered’]

We also apply pd.to_datetime to df[‘date’]

df[‘date’] = pd.to_datetime(df[‘date’])

and define the unique country list

country_list = list(df[‘country’].unique())

PyPlot

Let’s plot the ‘confirmed’ column by selecting Germany from country_list

plot = plt.plot(df[‘date’][df[‘country’]==’Germany’], df[‘confirmed’][df[‘country’]==’Germany’])
plt.show()
fig1 = plot

Germany confirmed COVID-19 cases - initial inspection using plt.plot

This plot is suitable only for the initial inspection of our dataset.

Plotly

Let’s use the Plotly library to plot confirmed COVID-19 cases for all countries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option(“display.max_columns”,None)
pd.set_option(“display.max_rows”,None)
import warnings
warnings.filterwarnings(“ignore”)
from IPython.display import Image
sns.set(style=”darkgrid”, palette=”pastel”, color_codes=True)
sns.set_context(“paper”)
from datetime import datetime
import datapane as dp

Plotly imports

import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = “seaborn”
from plotly.subplots import make_subplots
buttons = []
i = 0

fig3 = go.Figure()

country_list = list(df[‘country’].unique())

for country in country_list:
fig3.add_trace(
go.Scatter(
x = df[‘date’][df[‘country’]==country],
y = df[‘confirmed’][df[‘country’]==country],
name = country, visible = (i==0)
)
)

for country in country_list:
args = [False] * len(country_list)
args[i] = True

#create a button object for the country we are on
button = dict(label = country,
              method = "update",
              args=[{"visible": args}])

#add the button to our list of buttons
buttons.append(button)

#i is an iterable used to tell our "args" list which value to set to True
i+=1

fig3.update_layout(updatemenus=[dict(active=0,
type=”dropdown”,
buttons=buttons,
x = 0,
y = 1.1,
xanchor = ‘left’,
yanchor = ‘bottom’),
])

fig3.update_layout(
autosize=False,
width=1000,
height=500)

Plotly plot confirmed COVID-19 of all countries

Let’s add the time slider to the above plot

i = 0

fig3 = go.Figure()

country_list = list(df[‘country’].unique())

for country in country_list:
fig3.add_trace(
go.Scatter(
x = df[‘date’][df[‘country’]==country],
y = df[‘confirmed’][df[‘country’]==country],
name = country, visible = (i==0)
)
)

for country in country_list:
args = [False] * len(country_list)
args[i] = True

#create a button object for the country we are on
button = dict(label = country,
              method = "update",
              args=[{"visible": args}])

#add the button to our list of buttons
buttons.append(button)

#i is an iterable used to tell our "args" list which value to set to True
i+=1

fig3.update_layout(updatemenus=[dict(active=0,
type=”dropdown”,
buttons=buttons,
x = 0,
y = 1.1,
xanchor = ‘left’,
yanchor = ‘bottom’),
])

fig3.update_layout(xaxis1_rangeslider_visible=True,
height=600)

fig3.update_layout(
autosize=False,
width=1000,
height=500)

Plotly plot confirmed COVID-19 of all countries with the time slider

Bokeh

Let’s load the Bokeh notebook

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option(“display.max_columns”,None)
pd.set_option(“display.max_rows”,None)
import warnings
warnings.filterwarnings(“ignore”)
from IPython.display import Image
sns.set(style=”darkgrid”, palette=”pastel”, color_codes=True)
sns.set_context(“paper”)
from datetime import datetime

import datapane as dp
Bokeh imports:

from bokeh.io import output_file, show, output_notebook, save
from bokeh.models import ColumnDataSource, Select, DateRangeSlider
from bokeh.plotting import figure, show
from bokeh.models import CustomJS
from bokeh.layouts import row,column
output_notebook()

BokehJS 2.4.3 successfully loaded.

Let’s plot confirmed COVID-19 cases by selecting Germany from the country list

cols1=df.loc[:, [‘country’,’date’, ‘confirmed’]]
cols2 = cols1[cols1[‘country’] == ‘Germany’ ]

Overall = ColumnDataSource(data=cols1)
Curr=ColumnDataSource(data=cols2)

#plot and the menu is linked with each other by this callback function

callback = CustomJS(args=dict(source=Overall, sc=Curr), code=”””
var f = cb_obj.value
sc.data[‘date’]=[]
sc.data[‘confirmed’]=[]
for(var i = 0; i <= source.get_length(); i++){
if (source.data[‘country’][i] == f){
sc.data[‘date’].push(source.data[‘date’][i])
sc.data[‘confirmed’].push(source.data[‘confirmed’][i])
}
}

sc.change.emit();
“””)

menu = Select(options=country_list,value=’Afghanistan’, title = ‘Country’) # drop down menu
bokeh_p=figure(x_axis_label =’date’, y_axis_label = ‘confirmed’, y_axis_type=”linear”,x_axis_type=”datetime”) #creating figure object
bokeh_p.line(x=’date’, y=’confirmed’, color=’green’, source=Curr) # plotting the data using glyph circle
menu.js_on_change(‘value’, callback) # calling the function on change of selection
layout=column(menu, bokeh_p) # creating the layout
show(layout)

Bokeh plot confirmed COVID-19 Germany

Let’s add the time slider to the above plot

cols1=df.loc[:, [‘country’,’date’, ‘confirmed’]]
cols2 = cols1[cols1[‘country’] == ‘Germany’ ]

Overall = ColumnDataSource(data=cols1)
Curr=ColumnDataSource(data=cols2)

#plot and the menu is linked with each other by this callback function

callback = CustomJS(args=dict(source=Overall, sc=Curr), code=”””
var f = cb_obj.value
sc.data[‘date’]=[]
sc.data[‘confirmed’]=[]
for(var i = 0; i <= source.get_length(); i++){
if (source.data[‘country’][i] == f){
sc.data[‘date’].push(source.data[‘date’][i])
sc.data[‘confirmed’].push(source.data[‘confirmed’][i])
}
}

sc.change.emit();
“””)

menu = Select(options=country_list,value=’Afghanistan’, title = ‘Country’) # drop down menu
bokeh_p=figure(x_axis_label =’date’, y_axis_label = ‘confirmed’, y_axis_type=”linear”,x_axis_type=”datetime”) #creating figure object
bokeh_p.line(x=’date’, y=’confirmed’, color=’green’, source=Curr) # plotting the data using glyph circle
menu.js_on_change(‘value’, callback) # calling the function on change of selection

date_range_slider = DateRangeSlider(value=(min(df[‘date’]), max(df[‘date’])),
start=min(df[‘date’]), end=max(df[‘date’]))

date_range_slider.js_link(“value”, bokeh_p.x_range, “start”, attr_selector=0)
date_range_slider.js_link(“value”, bokeh_p.x_range, “end”, attr_selector=1)

layout = column(menu, date_range_slider, bokeh_p)
show(layout) # displaying the layout

Bokeh plot confirmed COVID-19 Germany with the time slider

Vega-Altair

Let’s plot confirmed COVID-19 cases by selecting Germany from the drop-down alt_plot menu

input_dropdown = alt.binding_select(options=country_list)
selection = alt.selection_single(fields=[‘country’], bind=input_dropdown, name=’Country’)

alt_plot = alt.Chart(df).mark_line().encode(
x=’date’,
y=’confirmed’,
tooltip=’confirmed’
).add_selection(
selection
).transform_filter(
selection
)

alt_plot

Germany confirmed COVID-19 Alt plot drop-down country list

Let’s add the time slider slider.date[0] to the above plot

input_dropdown = alt.binding_select(options=country_list)
selection = alt.selection_single(fields=[‘country’], bind=input_dropdown, name=’Country’)

def timestamp(t):
return pd.to_datetime(t).timestamp() * 1000

slider = alt.binding_range(
step=30 * 24 * 60 * 60 * 1000, # 30 days in milliseconds
min=timestamp(min(df[‘date’])),
max=timestamp(max(df[‘date’])))

select_date = alt.selection_single(
fields=[‘date’],
bind=slider,
init={‘date’: timestamp(min(df[‘date’]))},
name=’slider’)

alt_plot = alt.Chart(df).mark_line().encode(
x=’date’,
y=’confirmed’,
tooltip=’confirmed’
).add_selection(
selection
).transform_filter(
selection
).add_selection(select_date).transform_filter(
“(year(datum.date) == year(slider.date[0])) && “
“(month(datum.date) == month(slider.date[0]))”
)

alt_plot

Germany confirmed COVID-19 Alt plot drop-down country list with time slider

Summary

  • All 4 libraries can be used to create a plot of COVID-19 cases vs. time (while displaying the date correctly)
  • The plt.plot is very useful for the quick initial QC of our dataset prior to EDA.
  • We can create a drop-down menu to choose a country with/without time slider, zoom and pan on the plot during EDA
  • Bokeh and Vega-Altair have typically been our “go to” for creating interactive COVID-19 plots in Python.
  • There are other libraries and nice plotting options to explore in future.

EdTech

COVID19 Data Visualization Using Python

Explore More

Firsthand Data Visualization in R: Examples

What is the Best Interactive Plotting Package in Python?

Visualizing COVID-19 with Pandas & MatPlotLib

Visualizing COVID-19 Data using Julia

Embed Socials


One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

$5.00
$15.00
$100.00
$5.00
$15.00
$100.00
$5.00
$15.00
$100.00

Or enter a custom amount

$

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: