Skip to main content

Posts

Showing posts with the label analysis

Data Visualization — IPL Data Set (Part 2)

  Welcome to the 3rd Post in the series of Data Visualization, one of the most loved/followed topics of India — IPL (Indian Premier League) (Part 2) 2008–2020. In Part1 we did an analysis based on the Teams , here we will be doing analysis based on all other fields and try to cover some very interesting and unique analyses. Overview of the Data Set Description of columns of IPL Dataset -1 Description of columns of IPL Dataset -2 Let’s Begin by Checking Data in these columns... IPL Data Set 1 Overview IPL Data Set 2 Overview Let’s begin with some visualization and finding the top 10 players, by analyzing the No. of MoM(Man of the Match) awards achieved. MoM Awards The above graph shows the top 10 players of IPL with the most number of Man of the Match Awards… and guess what… it's none other than our Mr. 360 (ABD) with 23 awards followed by The Universe Boss (Gayle 333) 22 awards, roHIT MAN of India with 18, Warner and Captain Cool (MSD) with 17 each. Just a random thought of checkin

Data Visualization — IPL Data Set (Part 1)

Welcome to the 2nd Post in the series of Data Visualization, one of the most loved/followed topics of India — IPL (Indian Premier League) (Part 1) In this, we will be focusing on the various analysis based on the Teams. Overview of the Data Set Description of columns of IPL Dataset Let’s Begin by Checking Data in these columns IPL Data Set 1 Overview IPL Data Set 2 Overview Moving towards the most interesting part, Visualize the dataset and relations. Let's begin by having a look at the total wins by each Team since 2008... Team VS No. of Match Wins The above Bar chart shows the top 5 teams with the most number of Match Wins across all the seasons. Surprisingly, RCB in among the top 5 still hasn’t won any IPL Season. Now Let’s have a look at these wins Team VS Wins based on runs/wickets The above Bar chart is a detailed version of the previous graph which shows the top 5 teams with the most wins divided by wins achieved batting first and batting second. Blue Bars represent the wins

The Explorer of Data Sets -- Dora

Exploring the dataset is both fun and tedious but an inevitable step for the Machine Learning journey. The challenge always stands for correctness, completeness and timely analysis of the data.  To overcome these issues lot of libraries are present, having their advantages and disadvantages. We have already discussed a few of them( Pandas profiling , dtale , autoviz , lux , sweetviz ) in previous articles. Today, we would like to present a new library for Exploratory Data Analysis --- Dora.  Saying only an EDA library would not be justified as it does not help explore the dataset but also helps to adjust data for the modelling purpose.

Automatic Visualization with AutoViz

We have discussed Exploratory Data Analysis, known as EDA & have also seen few powerful libraries that we can use extensively for EDA. EDA is a key step in Machine Learning, as it provides the start point for our Machine Learning task. But, there are a lot of issues related to traditional Data Analysis techniques. There are too many new libraries coming up in the market to rectify these issues. One such API is AutoViz, which provides Quick and Easy visualization with some insights about the data.

Pandas Profiling -- A Unique way to Data Analysis

Source: Google Images Pandas Profiling is an Open-Source Library of Python. It focuses on easing out the process of initial data analysis, by providing a tool to perform the analysis of our data Quick & Easy. It's also considered a major EDA library, creating visuals, graphs, data profiling reports, pandas reports within seconds, in just a line of code. It saves a lot of time, which is usually lost in visualizing & understanding the data. It extends the pandas data frame to create a report for Quick & Easy Data Analysis.

D-Tale -- One Stop Solution for EDA

D-Tale is a new recently launched(Feb 2020) tool for Exploratory Data Analysis. It is made up of Flask(for back-end) and React(for Front-end) providing a powerful analysing and visualizing tool.  D-Tale is a Graphical User Interface platform that is not only Quick & Easy to understand but also great fun to use. It comes with so many features packed and loaded in it that reduces the manual work of Data Engineers/Scientists analysing and understanding the data and removes the load of looking for multiple different libraries used in EDA.  Let's have a look at some features which make it so amazing:- 1. Seamless Integration -- D-tale provides seamless integration with multiple python/ipython notebooks and terminals. So, we can use it with almost any IDE of our choice. 2. Friendly UI  -- The Graphical User Interface provided by D-tale is quite simple and easy to understand, such that anybody can easily get friendly with it & start working right away.  3. Support of multiple Py

EDA ---- Exploratory Data Analysis

EDA EDA - Exploratory Data Analysis is the technique of defining, analyzing and investigate the dataset. This technique is used by most data scientists, engineers and everyone who is related to or wants to work and analyze the data. Saying that, it includes the whole majority of us as at any point of time we are dealing with data and we un-knowingly do an initial analysis about which in technical terms is referred to as   "Exploratory Data Analysis". Here is a formal definition of the EDA:-  In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods.  Still confused about how every one of using this process..!! Let me explain it with a simple example... Suppose you and your group plan for lunch in a restaurant... as soon as we hear "lunch" and "restaurant" our mind starts creating a list of all the known places, next as someon

One Click Data Visualization

What is Data Visualization?  Data Visualization as the name suggests is creating nice, beautiful and informative visuals from our data, which helps get more insights from the data. It helps us and the third person who sees our analysis or report in reading it better. Creating a good visualization helps us in understanding the data better and helps in our machine learning journey.  The data visualization process uses various graphs, graphics, plots for explaining the data and getting insights. DV is important to simplify complex data by making it more  accessible, understandable, and usable to its end users. If you want to know in more detail about data visualization you can Read IT Here .