Skip to main content

Posts

Showing posts with the label Python

Familiarizing with NLTK: Basics of Statistics and Loops for Text Analysis

  Hope you are following up with us and by now you have become familiar with Jupyter Notebook , the basics of NLTK and Python functions .  Now, we will explore some more complex topics in NLTK, starting with statistics we will go through the loops and then learn how to apply conditions to the texts.  From the previous article, we know how to find the size of text, and how to access tokens based on indexes. But how to find out which tokens are highly repeated in the text or are there any specific tokens that the author has repeated many times to focus or emphasize on a particular topic.  Yes, this is very simple and can be done easily by using frequency distribution. Frequency Distribution can be understood simply as counting the repetition of different tokens in a text. Eg. In this paragraph, we have tokens ' frequency ', ' distribution ', and ' can ' present twice while 'understood', 'counting', 'repetition', 'using' etc have ju

Hands-On NLP with NLTK: A Practical Guide from Setup to Practical

  Hope you are following us and have installed Python and Anaconda in your systems, if not please refer here  and install it before proceeding further. If you have some system restrictions, then you can log in to Google Colab for free and start working there. It is very similar to Jupyter notebooks , which we will be using throughout our training.  Note:- You can download all the notebooks used in this example here Installations The first step is to install the NLTK library and the NLTK data.  1. Install NLTK using pip command            pip install nltk installing nltk Since it is already installed in my system, it's showing "requirement already satisfied".  Instead of using Jupyter Notebook we can also create a virtual env in our system and follow these steps in conda/ python prompt.  2. Download NLTK data          nltk.download() nltk download This will open a new window NLTK Downloader as shown  It basically contains all the data and other packages for nltk, so we wi

Navigating the NLP Landscape: A Comprehensive Guide to Top Python Libraries

Welcome back to Part 2 of our Natural Language Processing series . As we told you in the beginning these sessions are going to be a mix of both theoretical and practical, so the first thing we need to do is to set our machines for NLP and learn about various libraries that Python has to offer for NLP. If you are new to NLP, then go ahead to Part 1 Introduction to NLP - Getting Started  and learn about the basics of Natural Language Processing, key terminologies and why we need NLP.  Prerequisites 1. Python - 3.7 and above 2. Anaconda or Jupyter Notebook Libraries for NLP Python being an open-source programming language offers a wide range of libraries that can be used for Natural Language Processing(NLP). Here is the list of libraries present in Python for NLP.  1.  Natural Language Toolkit (NLTK) :-     The most common library in Python for NLP is NLTK (Natural Language Toolkit), as it supports a wide range of  languages. Not only this, being an open source it is freely available to s

Variable Encoding

Introduction  Computers are one of the best creations of  Human Beings. They are so powerful and useful that which was once a luxury item has now become so common that it can be seen everywhere like watches, cars, spaceships etc, etc. They have become so common now that imagining a life without them is like going back to the 'Stone Age'...  These computerised systems might be great, but have one serious issue, i.e. they work on only Numerical Data, more specifically, Binary Data, i.e 1 & 0 only. But the data we see around us can be Numerical, Alphabetical, Categorical, Visual, Audible and others.  Now, coming to the point, whether it is Machine Learning, Data Science, Deep Learning, or Artificial Intelligence. All these work on data, i.e. they use data to deliver results. But like we know all the data sets are/can be a mixture of Numerical, Alphabetical & Categorical(let's ignore Audio & Visual data for now). Dealing with Numerical data is not an issue with comp

ExploriPy -- Newer ways to Exploratory Data Analysis

Introduction  ExploriPy is yet another Python library used for Exploratory Data Analysis. This library pulled our attention because it is Quick & Easy to implement also simple to grasp the basics. Moreover, the visuals provided by this library are self-explanatory and are graspable by any new user.  The most interesting part that we can't resist mentioning  is the easy grouping of the variables in different sections. This makes it more straightforward to understand and analyze our data. The Four Major sections presented are:-  Null Values Categorical VS Target Continuous VS Target Continuous VS Continuous  

The Explorer of Data Sets -- Dora

Exploring the dataset is both fun and tedious but an inevitable step for the Machine Learning journey. The challenge always stands for correctness, completeness and timely analysis of the data.  To overcome these issues lot of libraries are present, having their advantages and disadvantages. We have already discussed a few of them( Pandas profiling , dtale , autoviz , lux , sweetviz ) in previous articles. Today, we would like to present a new library for Exploratory Data Analysis --- Dora.  Saying only an EDA library would not be justified as it does not help explore the dataset but also helps to adjust data for the modelling purpose.

Automatic Visualization with AutoViz

We have discussed Exploratory Data Analysis, known as EDA & have also seen few powerful libraries that we can use extensively for EDA. EDA is a key step in Machine Learning, as it provides the start point for our Machine Learning task. But, there are a lot of issues related to traditional Data Analysis techniques. There are too many new libraries coming up in the market to rectify these issues. One such API is AutoViz, which provides Quick and Easy visualization with some insights about the data.

A Sweat way to Exploratory Data Analysis --- Sweetviz

Another day, another beautiful library for Exploratory Data Analysis(EDA) . Having studied some great libraries like Lux , D-tale , pandas profiling of EDA , we are back with another great API, 'SWEETVIZ', which you can use for your Data Science Project. Introduction It is an open-source Library of Python & is still in the development phase. It already has some great features to offer, & makes it our choice to bring it for you. Its sole purpose is to visualise & analyse data Quickly. The best feature of this API is it provides an option to compare two datasets, i.e. we can compare & analyse the test vs training data together. That's not all it's, just the starting. Let's dive deeper and see what it has more to offer us. 

EDA Techniques

We had a look over the basics of EDA in our previous article  EDA - Exploratory Data Analysis . So now let's move ahead and look at how we can automate the process and the various APIs used for the same. We will be focusing on the 7 major libraries that can be used for the same. These are our personal favourites & we prefer to use them most of the time.  We will look into the libraries' & will cover the install, load, and analyse parts for each separately.  D-tale Pandas - Profiling Lux Sweetviz Autoviz ExploriPy Dora

D-Tale -- One Stop Solution for EDA

D-Tale is a new recently launched(Feb 2020) tool for Exploratory Data Analysis. It is made up of Flask(for back-end) and React(for Front-end) providing a powerful analysing and visualizing tool.  D-Tale is a Graphical User Interface platform that is not only Quick & Easy to understand but also great fun to use. It comes with so many features packed and loaded in it that reduces the manual work of Data Engineers/Scientists analysing and understanding the data and removes the load of looking for multiple different libraries used in EDA.  Let's have a look at some features which make it so amazing:- 1. Seamless Integration -- D-tale provides seamless integration with multiple python/ipython notebooks and terminals. So, we can use it with almost any IDE of our choice. 2. Friendly UI  -- The Graphical User Interface provided by D-tale is quite simple and easy to understand, such that anybody can easily get friendly with it & start working right away.  3. Support of multiple Py

EDA ---- Exploratory Data Analysis

EDA EDA - Exploratory Data Analysis is the technique of defining, analyzing and investigate the dataset. This technique is used by most data scientists, engineers and everyone who is related to or wants to work and analyze the data. Saying that, it includes the whole majority of us as at any point of time we are dealing with data and we un-knowingly do an initial analysis about which in technical terms is referred to as   "Exploratory Data Analysis". Here is a formal definition of the EDA:-  In statistics, exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods.  Still confused about how every one of using this process..!! Let me explain it with a simple example... Suppose you and your group plan for lunch in a restaurant... as soon as we hear "lunch" and "restaurant" our mind starts creating a list of all the known places, next as someon

One Click Data Visualization

What is Data Visualization?  Data Visualization as the name suggests is creating nice, beautiful and informative visuals from our data, which helps get more insights from the data. It helps us and the third person who sees our analysis or report in reading it better. Creating a good visualization helps us in understanding the data better and helps in our machine learning journey.  The data visualization process uses various graphs, graphics, plots for explaining the data and getting insights. DV is important to simplify complex data by making it more  accessible, understandable, and usable to its end users. If you want to know in more detail about data visualization you can Read IT Here .

Anaconda -- How to install in 5 steps in Windows

  Image taken from Google images An easy to go guide for installing the Anaconda in Windows 10. 1. Prerequisites      Hardware Requirement * RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work. * CPU — Min. Quad-core, with at least 1.80GHz  Operating System * Windows 8 or later  System Architecture Windows- 64-bit x86, 32-bit x86  Space Minimum 5 GB disk space to download and install   Anaconda   We need to download the Anaconda from HERE .  On opening the link we would be greeted by a great web page.   Now click on "Get Started"   to continue...  The next step is to click on "Download Installer" to proceed...  Select the correct version based on your System's architecture. I will be using a 64-bit installer (477 MB). Your download should now.. it will take some time...  Let's catch up in 2nd Section (Unzip and Install)