Introduction
When we talk encoding, one thing that usually comes to our mind is why can't we simply write down all the values from a variable in a list and assign them values 1,2,3,4..... and so on. Just like we did in our childhood while playing..!!!
The answer is YES..!!! we can do it.. in fact, we will do it... or rather we are going to do it here...
Ordinal Encoding is encoding the categorical variables with ordinal numbers like 1,2,3,4...etc. This way of encoding can be either done by assigning 'Arbitrary' values to the variables or can be based on some value like Mean, or target data.
Arbitrary Ordinal Encoding:- Here the ordinal numbers are allotted randomly to the variables for the encoding.
Mean Ordinal Encoding:- Here the ordinal numbers are allotted based on the Target Mean value(Just like we did in Mean/Target Encoding) to the variables for the encoding.
Some Important Points
While we go ahead and perform Categorical Variable Encoding using Ordinal Encoding, there are a few points that one should keep in mind:-
Before using this technique, we need to divide the dataset into train and test sets.
- Train this technique only over the train set.
- Using this trained model, encode the values from both train and test sets.
- In case, if some values are missing in the train set at the time of training the model and encountered in the test set, it will give an error for such values.
Advantages
- This technique is quite simple to implement.
- It does not expand the feature space.
- Creates a monotonic relation between variable and target.
- Due to monotonic relation best suitable for Linear Models.
Disadvantages
- Prone to cause over-fitting
- Difficult to cross-validate with current libraries.
Practical
We will be using the feature-engine library of python for demo purposes.
1. Importing the Libraries
2. Viewing the Data
Here, while fitting the data(training the data) we need to specify the target variable also.
4. Transforming using Ordinal Encoder
Ordinal Encoder transform |
Here Male is encoded by value '0' whereas Female is encoded by '1'.
5. Verifying Data
Resources
Please comment below to get the complete dataset and libraries.
Learn to install Anaconda Here.
Learn about the Feature-Engine library Here.
Summary
In this Quick Reads, we studied a technique, Ordinal Encoding that is commonly used for Categorical Variable Encoding. We had a quick overview of the technique, saw the positives and negatives of this technique and some quick Points to Remember.
We also performed a practical demo of this technique using a famous python library "feature-engine".
Last but not least... "Practice Makes One Perfect". So what are you waiting for practice this technique and comment below your views, doubts or anything? We are here to help you.
Comments
Post a Comment