Skip to main content

Posts

Showing posts with the label Getting Started with NLP

Familiarizing with NLTK: Basics of Statistics and Loops for Text Analysis

  Hope you are following up with us and by now you have become familiar with Jupyter Notebook , the basics of NLTK and Python functions .  Now, we will explore some more complex topics in NLTK, starting with statistics we will go through the loops and then learn how to apply conditions to the texts.  From the previous article, we know how to find the size of text, and how to access tokens based on indexes. But how to find out which tokens are highly repeated in the text or are there any specific tokens that the author has repeated many times to focus or emphasize on a particular topic.  Yes, this is very simple and can be done easily by using frequency distribution. Frequency Distribution can be understood simply as counting the repetition of different tokens in a text. Eg. In this paragraph, we have tokens ' frequency ', ' distribution ', and ' can ' present twice while 'understood', 'counting', 'repetition', 'using' etc have ju

Hands-On NLP with NLTK: A Practical Guide from Setup to Practical

  Hope you are following us and have installed Python and Anaconda in your systems, if not please refer here  and install it before proceeding further. If you have some system restrictions, then you can log in to Google Colab for free and start working there. It is very similar to Jupyter notebooks , which we will be using throughout our training.  Note:- You can download all the notebooks used in this example here Installations The first step is to install the NLTK library and the NLTK data.  1. Install NLTK using pip command            pip install nltk installing nltk Since it is already installed in my system, it's showing "requirement already satisfied".  Instead of using Jupyter Notebook we can also create a virtual env in our system and follow these steps in conda/ python prompt.  2. Download NLTK data          nltk.download() nltk download This will open a new window NLTK Downloader as shown  It basically contains all the data and other packages for nltk, so we wi

Introduction to NLP - Getting Started

Background In today's world of all sorts of Artificial Intelligence coming in, a major element required for the machines to be trained is "Data". A major portion of the data today is generated from social media and mediums like virtual assistants, blogs, news, videos, audio, images, all sorts of papers(research, white) etc. are mostly unstructured. As per current scenarios, there are around  8.5 billion Google searches per day and approximately 2 trillion global searches per year , similarly for Bing we have around 27 billion web searches per month. 37.5 million web searches per hour . But, if we go by industry estimates, less than 25% of the data today is available in structural or tabular data.  So, now the question arises when the data is available in human languages then how we can use them to train machines and get accurate AI machines. The answer for one of the problems, i.e. the textual data is  "Natural Language Processing".  What is Natural Language Pro