A Comparison of Automated EDA Tools in Python: Pandas-Profiling vs SweetViz

  • Exploratory Data Analysis (EDA) is a crucial step in any data science project. 
  • Its primary goals include discovering patterns, identifying anomalies, and finding relationships between variables.
  • EDA can be broken down into different types of descriptive statistical analyses, such as univariate, bivariate, and multivariate data analytics.  
  • Automation can be used to conduct EDA, allowing for faster and more efficient analysis of data.
  • Data scientists can utilize automated EDA tools to accelerate the time-consuming data preparation phase, minimize ETL errors, and gain a more complete from A to Z understanding of input data.
  • This post will compare some popular Python libraries to automate EDA. The goal is to examine every aspect of the EDA procedure, including data cleaning, data visualization, and statistical analysis. Specifically, we will compare the types and quality of visualizations each tool supports.

Table of Contents

  1. Pandas-Profiling
  2. SweetViz
  3. Other Libraries
  4. Summary
  5. Explore More
  6. References

Pandas-Profiling

Let’s set the working directory

import os
os.chdir(‘YOURPATH’)
os. getcwd()

import the key libraries and load the input data

import pandas as pd
from pandas_profiling import ProfileReport

df = pd.read_csv(“https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv”)

Let’s generate the HTML report

profile = ProfileReport(df, title=”Pandas Profiling Report”)
profile.to_file(“report.html”)

Let’s look at the report

Overview

Interactions:

Interactions

Correlations: Heatmap

Correlations: Heatmap

Correlations: Table

Missing values count

SweetViz

Let’s import the key libraries

import sweetviz as sv
import pandas as pd

and proceed with the same dataset as above, viz.

df = pd.read_csv(“https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv”)

Let’s create and display an analysis report for our data

report = sv.analyze(df)

Done! Use ‘show’ commands to display/save

report.show_html()

Report SWEETVIZ_REPORT.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.

Let’s look at the report

SweetViz report

Other Libraries

Other automation libraries that can assist in EDA are as follows (cf. References):

Dataprep, D-tale, Pandas GUI, Dabl, Bamboolib, AutoViz, Dora, Visidata, Scattertext, QuickDA, ExploriPy, Rath, and Lux.

Summary

  • Automated EDA libraries can perform data analysis, wrangling, editing, cleaning, visualization, and ETL transformations in a few lines of Python code.
  • Pandas-Profiling creates an interactive HTML report that displays various summary statistics and visualizations of a given Pandas DataFrame.
  • SweetViz also creates an HTML report with visualizations that provide insights into the data, including distributions of features, missing values, and correlations between features (Associations).
  • The visualizations can be customized and fine-tuned as needed to best suit your EDA needs.
  • These tools are especially powerful for marketers, salespeople and BI analysts looking to perform data analysis and present their data to stakeholders. 

Explore More

References


← Back

Thank you for your response. ✨

One-Time
Monthly
Yearly

Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount

€5.00
€15.00
€100.00
€5.00
€15.00
€100.00
€5.00
€15.00
€100.00

Or enter a custom amount


Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

Discover more from Our Blogs

Subscribe to get the latest posts sent to your email.

Leave a comment

Discover more from Our Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading