Skip to main content

Posts

Showing posts with the label mean encoder

Ordinal Encoding

  Introduction When we talk encoding, one thing that usually comes to our mind is why can't we simply write down all the values from a variable in a list and assign them values 1,2,3,4..... and so on. Just like we did in our childhood while playing..!!!  The answer is YES..!!! we can do it.. in fact, we will do it... or rather we are going to do it here...  Ordinal Encoding is encoding the categorical variables with ordinal numbers like 1,2,3,4...etc. This way of encoding can be either done by assigning 'Arbitrary' values to the variables or can be based on some value like Mean, or target data.   Arbitrary Ordinal Encoding:- Here the ordinal numbers are allotted randomly to the variables for the encoding. Mean Ordinal Encoding:- Here the ordinal numbers are allotted based on the Target Mean value(Just like we did in Mean/Target Encoding ) to the variables for the encoding.

Count Frequency Encoding

Introduction The first method that is mostly used for Categorical Variable Encoding is "Count Frequency Encoding". This method is used to replace the categorical variable either with their count of values or the percentage share of the value in total space.  Let's see an example to understand it better Dummy Count Frequency Encoding Here we have created dummy data of 6 car companies and the colour of most selling cars on the left-hand side. While on the right-hand side we can see the list of the same cars but the Categorical Variable, i.e colour has been encoded using the Count Frequency Encoder, by both Count and Percentage.  Since there were 2 companies, Tata and Jaguar having Grey as the most sold colour. Therefore, when encoding using count they both got the value 2, denoting that their value was repeated twice in the dataset and both had the same value.

Encoding

  Welcome to another series of Quick Reads... This series of Quick Reads focuses on another major step in the process of Data Preprocessing, i.e. Variable Encoding.  We will be studying every detail from What is Variable Encoding to What techniques do we use with their shortcomings and strengths together with a practical demo. All this is in our series of Quick Reads. Trust us, when we say Quick Reads, then we truly mean teaching and explaining some heavy concepts in Data Science, at the same time in which we cook our 'Maggie'.    INDEX 1. What is Variable Encoding?   2. Techniques used for Variable Encoding     2.1 Count Frequency Encoding     2.2 Mean/ Target Encoding     2.3 One Hot Encoding     2.4 Ordinal Encoding     2.5 Rare Label Encoding     2.6 Decision Tree Encoding