Another day, another beautiful library for Exploratory Data Analysis(EDA).
Having studied some great libraries like Lux, D-tale, pandas profiling of EDA, we are back with another great API, 'SWEETVIZ', which you can use for your Data Science Project.
Introduction
It is an open-source Library of Python & is still in the development phase. It already has some great features to offer, & makes it our choice to bring it for you.
Its sole purpose is to visualise & analyse data Quickly. The best feature of this API is it provides an option to compare two datasets, i.e. we can compare & analyse the test vs training data together.
That's not all it's, just the starting. Let's dive deeper and see what it has more to offer us.
Installation
Installing these libraries is exceptionally simple and can be done Quickly & Easily, just like us & with us.
Code:-
## conda installation
conda install sweetviz
## pip installation
pip install sweetviz
## Jupyter Notebook installation
pip install sweetviz
As per the official docs, a few of us might face issues while installing the library. Don't worry Sweetviz team have described the solutions well HERE.
It takes few minutes to install. Let's grab a coffee by then.
Installing Sweetviz |
Once the library is installed, we might be asked to restart the kernel(in the case of Jupyter Notebook) to reflect the changes.
Great we have installed our SweetViz library. And good to go with some examples which will help us understand it better.
Getting Started
*Please Note:- We prefer using Titanic Dataset as our first dataset for analysis.
Importing the data
Importing library & data |
Since sweetviz provides an option to analyse a single dataset & can compare two, we will be checking both in different subsections, starting with a single dataset.
- Single Dataset
Single Dataset Report |
This will open the report in a new Tab. As shown below:-
Sweetviz report for a single dataset |
Wait we, will explain the graph in the next section. Till then, hold your nerves while we study other ways to analyse the data.
- Comparing two datasets
Comparing Dataset Reports |
This will open the report in a new Tab. As shown below:-
Sweetviz report for comparing two datasets |
Another great feature that sweetviz offers is to compare two subsets of the dataset.
- Comparing two subsets of data
Report Overview
Let's have a look at the report generated.
But before that, sweetviz offers two ways to view reports.
- In a New Tab:- We have seen this way, where the report shows in a new tab. For this, we use the report.show_html() command.
- In-line:- If we are working in Jupyter Notebook and want to view the report in the same tab, we have to use the report.show_notebook() command for same.
In-line visualization |
Sweetviz presents the analysis column-wise, i.e. it analyses the data & presents the results column-wise like Quantiles, mean, median, variance, skewness, kurtosis etc. It makes it easy to analyse column specific data apart from this on selecting the column that we wish to analyse & presents some more data like the correlation of that particular column & the column that impacts the value of our selected column most.
Column-wise Data Analysis |
Extra Analysis Data |
As shown in the above snapshot, we can see for the "Survived" column that the library provides information on what all features are impacting it? & What all features are impacted by it? & also, its correlation with other variables.
Similarly, while comparing two datasets, it analyses the data together for both the dataset. That makes it easier for us to see the distribution of variables in both datasets( particularly when we split train and test datasets).
best ....
ReplyDeleteThanks.. :)
Delete