Skip to main content

A Sweat way to Exploratory Data Analysis --- Sweetviz





Another day, another beautiful library for Exploratory Data Analysis(EDA).

Having studied some great libraries like Lux, D-tale, pandas profiling of EDA, we are back with another great API, 'SWEETVIZ', which you can use for your Data Science Project.



Introduction


It is an open-source Library of Python & is still in the development phase. It already has some great features to offer, & makes it our choice to bring it for you.


Its sole purpose is to visualise & analyse data Quickly. The best feature of this API is it provides an option to compare two datasets, i.e. we can compare & analyse the test vs training data together.


That's not all it's, just the starting. Let's dive deeper and see what it has more to offer us. 


Installation


Installing these libraries is exceptionally simple and can be done Quickly & Easily, just like us & with us. 


Code:- 

## conda installation

conda install sweetviz


## pip installation 

pip install sweetviz


## Jupyter Notebook installation

pip install sweetviz



As per the official docs, a few of us might face issues while installing the library. Don't worry Sweetviz team have described the solutions well HERE.


It takes few minutes to install. Let's grab a coffee by then. 


Installing Sweetviz


Once the library is installed, we might be asked to restart the kernel(in the case of Jupyter Notebook) to reflect the changes.


Great we have installed our SweetViz library. And good to go with some examples which will help us understand it better.


Getting Started


*Please Note:- We prefer using Titanic Dataset as our first dataset for analysis.


Importing the data


Importing library & data

Since sweetviz provides an option to analyse a single dataset & can compare two, we will be checking both in different subsections, starting with a single dataset.


  • Single Dataset


Single Dataset Report


This will open the report in a new Tab. As shown below:- 


Sweetviz report for a single dataset



Wait we, will explain the graph in the next section. Till then, hold your nerves while we study other ways to analyse the data.


  • Comparing two datasets


Comparing Dataset Reports


This will open the report in a new Tab. As shown below:- 


Sweetviz report for comparing two datasets



Another great feature that sweetviz offers is to compare two subsets of the dataset. 


  • Comparing two subsets of data


Comparing subsets of Dataset Reports


Report Overview


Let's have a look at the report generated. 


But before that, sweetviz offers two ways to view reports.


  1. In a New Tab:- We have seen this way, where the report shows in a new tab. For this, we use the report.show_html() command.
  2. In-line:- If we are working in Jupyter Notebook and want to view the report in the same tab, we have to use the report.show_notebook() command for same.


In-line visualization



Sweetviz presents the analysis column-wise, i.e. it analyses the data & presents the results column-wise like Quantiles, mean, median, variance, skewness, kurtosis etc. It makes it easy to analyse column specific data apart from this on selecting the column that we wish to analyse & presents some more data like the correlation of that particular column & the column that impacts the value of our selected column most.



Column-wise Data Analysis

Extra Analysis Data


As shown in the above snapshot, we can see for the "Survived" column that the library provides information on what all features are impacting it? & What all features are impacted by it? & also, its correlation with other variables.


Similarly, while comparing two datasets, it analyses the data together for both the dataset. That makes it easier for us to see the distribution of variables in both datasets( particularly when we split train and test datasets).


Column-wise data comparison of two datasets


Summary


We tried to understand the basics of a new library for EDA., How to install & Use.  As compared to other libraries of Python used for EDA, Sweetviz stands low, but since it's still in the development phase, we are eager to see how it progresses from here. 

Till then, What are you waiting for go, ahead download it and start playing with it, and share your views, issues and suggestions. 

That's all from here. Until then, This is the Quick DataScience Team providing a Quick and Easy guide/insight of another DataScience topic. 

Comments

Post a Comment