Skip to main content

Posts

Defining, Analyzing, and Implementing Imputation Techniques

  What is Imputation? Imputation is a technique used for replacing the missing data with some substitute value to retain most of the data/information of the dataset. These techniques are used because removing the data from the dataset every time is not feasible and can lead to a reduction in the size of the dataset to a large extend, which not only raises concerns for biasing the dataset but also leads to incorrect analysis. Fig 1:- Imputation Not Sure What is Missing Data? How it occurs? And its type? Have a look  HERE  to know more about it. Let’s understand the concept of Imputation from the above Fig {Fig 1}. In the above image, I have tried to represent the Missing data on the left table(marked in Red) and by using the Imputation techniques we have filled the missing dataset in the right table(marked in Yellow), without reducing the actual size of the dataset. If we notice here we have increased the column size, which is possible in Imputation(Adding “Missing” category imputation)

Missing Data -- Understanding The Concepts

  Introduction Machine Learning seems to be a big fascinating term, which attracts a lot of people towards it, and knowing what all we can achieve through it makes the sci-fi imagination of ours jump to another level. No doubt in it, it is a great field and we can achieve everything from an automated reply system to a house cleaning robots, from recommending a movie or a product to help in detecting disease. Most of the things that we see today have already started using ML to better themselves. Though building a model is quite easy, the most challenging task is preprocessing the data and filtering out the Data of Use. So, here I am going to address one of the biggest and common issues that we face at the start of the journey of making a Good ML Model, which is  The   Missing Data . Missing Data can cause many issues and can lead to wrong predictions of our model, which looks like our model failed and started over again. If I have to explain in simple terms, data is like Fuel of our Mo

Spark — How to install in 5 Steps in Windows 10

 An easy to go guide for installing the Spark in Windows 10. Image taken from Google images 1. Prerequisites Hardware Requirement * RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work. * CPU — Min. Quad-core, with at least 1.80GHz JRE 1.8   —   Offline installer for JRE  Java Development Kit — 1.8   A Software for Un-Zipping like   7Zip   or   Win Rar * I will be using 64-bit windows for the process, please check and download the version supported by your system x86 or x64 for all the software. Hadoop * I am using Hadoop-2.9.2, you can also use any other STABLE version for Hadoop.  * If you don’t have Hadoop, you can refer to installing it from   Hadoop: How to install in 5 Steps in Windows 10 . MySQL Query Browser Download Spark Zip * I am using Spark 3.1.1, you can also use any other STABLE version for Spark. * Latest release of Spark is 3.1.2(shown in the image below) released in June'21 Fig 1:- Download Spark-3.1.2

SQOOP — How to install in 5 Steps in Windows 10

  An easy to go guide for installing SQOOP in Windows 10. Image taken from Google images 1. Prerequisites Hardware Requirement * RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work. * CPU — Min. Quad-core, with at least 1.80GHz JRE 1.8   — Offline installer for JRE  Java Development Kit — 1.8   A Software for Un-Zipping like   7Zip   or   Win Rar * I will be using 64-bit windows for the process, please check and download the version supported by your system x86 or x64 for all the software. Hadoop * I am using Hadoop-2.9.2, you can also use any other STABLE version for Hadoop.  * If you don’t have Hadoop, you can refer to installing it from   Hadoop: How to install in 5 Steps in Windows 10 . MySQL Query Browser Download SQOOP zip * I am using SQOOP-1.4.7, you can also use any other STABLE version for SQOOP. Fig 1:- Download Sqoop 1.4.7

Hive — How to install in 5 Steps in Windows 10

  An easy to go guide for installing Hive in Windows 10. Image taken from Google images 1. Prerequisites Hardware Requirement * RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work. * CPU — Min. Quad-core, with at least 1.80GHz JRE 1.8  — Offline installer for JRE Java Development Kit — 1.8 A Software for Un-Zipping like  7Zip  or  Win Rar * I will be using 64-bit windows for the process, please check and download the version supported by your system x86 or x64 for all the software. Hadoop * I am using Hadoop-2.9.2, you can also use any other STABLE version for Hadoop. * If you don’t have Hadoop, you can refer to installing it from  Hadoop: How to install in 5 Steps in Windows 10 . MySQL Query Browser Download Hive zip * I am using Hive-3.1.2, you can also use any other STABLE version for Hive. Fig 1:- Download Hive-3.1.2

PIG: How to install in 5 Steps in Windows 10

  An easy to go guide for installing the PIG in Windows 10. Image taken from Google images 1. Prerequisites:-  Hardware Requirement * RAM —  Min. 8GB, if you have SSD in your system then 4GB RAM would also work. * CPU —  Min. Quad-core, with at least 1.80GHz JRE 1.8  —  Offline installer for JRE Java Development Kit — 1.8 A Software for Un-Zipping like 7Zip or Win Rar  ---- * I will be using 64-bit windows for the process, please check and download the version supported by your system x86 or x64 for all the software. Hadoop  ---- * I am using Hadoop-2.9.2, you can also use any other STABLE version for Hadoop. * If you don’t have Hadoop, you can refer to installing it from Hadoop: How to install in 5 Steps in Windows 10 . MySQL Query Browser Download PIG zip  ---- * I am using PIG-0.17.0, you can also use any other STABLE version of Apache Pig . Fig 1:- Download PIG-0.17.0

Hadoop : How to install in 5 Steps in Windows 10

  1. Prerequisites Hardware Requirement * RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work. * CPU — Min. Quad-core, with at least 1.80GHz JRE 1.8 — Offline installer for JRE Java Development Kit — 1.8 A Software for Un-Zipping like 7Zip or Win Rar * I will be using 64-bit windows for the process, please check and download the version supported by your system x86 or x64 for all the software. Download Hadoop zip * I am using Hadoop-2.9.2, you can use any other STABLE version for Hadoop. Once we have Downloaded all the above software, we can proceed with the next steps in installing the Hadoop. 2. Unzip and Install Hadoop After Downloading the Hadoop, we need to Unzip the Hadoop-2.9.2.tar.gz file. Once extracted, we would get a new file Hadoop-2.9.2.tar. Now, once again we need to extract this tar file. Now we can organize our Hadoop installation, we can create a folder and move the final extracted file in it. For Eg. :- Please note while creating folders