Skip to main content

Posts

Showing posts with the label target encoding

Rare Label Encoding

  Introduction Till now we have seen many techniques for encoding the categorical variables, all having amazing capabilities and performance. But let me put up a question first before diving into another new technique.  Ques.:-   Suppose we have around 50 different values for a variable, a few having a very high frequency of representation and some with very little representation. Which technique are you going to use for encoding here and Why?  Please share your answers below in the comment section. Even if you don't know the correct answer, please give it a try. By engaging yourself you will definitely learn more. DO NOT MOVE AHEAD TILL YOU HAVE THOUGHT/COMMENTED ON AN ANSWER.  So now, continuing to our topic. Rare Label Encoding is a technique used to group values together and assign them under a common "Rare Label" if they have very little representation as compared to the other values. Let's have an example to understand it better. Suppose we have a dataset of 100

Mean Encoding or Target Encoding

  Introduction  A technique that is most commonly used anywhere and everywhere is the 'Mean'. The first thing that comes to mind of a Data Scientist on seeing huge data is "Calculate the Mean". So, why not use the same technique here also and try to encode our categorical variables using the Mean.  This technique of encoding the categorical variable with the Mean is known as "Mean Encoding" or "Target Encoding".  This technique is known as Target Encoding because the mean of a value in a variable is calculated based on the Target Values. Let's have an example to understand it better...  Suppose, we have a variable of cars and another variable containing the mileage of the cars. So, if a car from Tata has a mileage of 50 then its value is encoded with 0.5, another car from Honda having a mileage of 30 will be assigned/encoded with 0.3.  Dummy Mean Encoding

Variable Encoding

Introduction  Computers are one of the best creations of  Human Beings. They are so powerful and useful that which was once a luxury item has now become so common that it can be seen everywhere like watches, cars, spaceships etc, etc. They have become so common now that imagining a life without them is like going back to the 'Stone Age'...  These computerised systems might be great, but have one serious issue, i.e. they work on only Numerical Data, more specifically, Binary Data, i.e 1 & 0 only. But the data we see around us can be Numerical, Alphabetical, Categorical, Visual, Audible and others.  Now, coming to the point, whether it is Machine Learning, Data Science, Deep Learning, or Artificial Intelligence. All these work on data, i.e. they use data to deliver results. But like we know all the data sets are/can be a mixture of Numerical, Alphabetical & Categorical(let's ignore Audio & Visual data for now). Dealing with Numerical data is not an issue with comp

Encoding

  Welcome to another series of Quick Reads... This series of Quick Reads focuses on another major step in the process of Data Preprocessing, i.e. Variable Encoding.  We will be studying every detail from What is Variable Encoding to What techniques do we use with their shortcomings and strengths together with a practical demo. All this is in our series of Quick Reads. Trust us, when we say Quick Reads, then we truly mean teaching and explaining some heavy concepts in Data Science, at the same time in which we cook our 'Maggie'.    INDEX 1. What is Variable Encoding?   2. Techniques used for Variable Encoding     2.1 Count Frequency Encoding     2.2 Mean/ Target Encoding     2.3 One Hot Encoding     2.4 Ordinal Encoding     2.5 Rare Label Encoding     2.6 Decision Tree Encoding