Skip to main content

Posts

Showing posts with the label statistical analysis

Navigating the NLP Landscape: A Comprehensive Guide to Top Python Libraries

Welcome back to Part 2 of our Natural Language Processing series . As we told you in the beginning these sessions are going to be a mix of both theoretical and practical, so the first thing we need to do is to set our machines for NLP and learn about various libraries that Python has to offer for NLP. If you are new to NLP, then go ahead to Part 1 Introduction to NLP - Getting Started  and learn about the basics of Natural Language Processing, key terminologies and why we need NLP.  Prerequisites 1. Python - 3.7 and above 2. Anaconda or Jupyter Notebook Libraries for NLP Python being an open-source programming language offers a wide range of libraries that can be used for Natural Language Processing(NLP). Here is the list of libraries present in Python for NLP.  1.  Natural Language Toolkit (NLTK) :-     The most common library in Python for NLP is NLTK (Natural Language Toolkit), as it supports a wide range of  languages. Not only this, being an ...

Missing Data -- Understanding The Concepts

  Introduction Machine Learning seems to be a big fascinating term, which attracts a lot of people towards it, and knowing what all we can achieve through it makes the sci-fi imagination of ours jump to another level. No doubt in it, it is a great field and we can achieve everything from an automated reply system to a house cleaning robots, from recommending a movie or a product to help in detecting disease. Most of the things that we see today have already started using ML to better themselves. Though building a model is quite easy, the most challenging task is preprocessing the data and filtering out the Data of Use. So, here I am going to address one of the biggest and common issues that we face at the start of the journey of making a Good ML Model, which is  The   Missing Data . Missing Data can cause many issues and can lead to wrong predictions of our model, which looks like our model failed and started over again. If I have to explain in simple terms, data is like ...