Introduction
In the past few articles, we have seen about Outliers, What are they, How they are introduced and discussed few techniques how to handle these outliers in our dataset.
Another technique that is widely used while handling outliers is capping the data. Capping means defining the limits for a field.
Capping in a sense is similar to trimming the dataset, but the difference here is, while trimming we used IQR or z-score and trimmed the data based on some IQR or z-score value. Here instead of trimming or removing the values from the dataset, we convert the outliers and bring them in the limit or range of our data.
Why Capping?
Capping is also sometimes referred to as Censoring. That is so because, when we use capping techniques in data preprocessing, we do not remove the values rather we convert the values higher than the capping value to capped value.
Sounds confusing..!!! It's simple, instead of trimming or removing the values above the limit, we convert the values to the 'limit'.
Another great thing about capping is, it can be used for capping/censoring both the Upper & Lower limit of data.
We can perform Capping/Censoring in the following ways:-
- Arbitrary Value
- Quantiles
- Gaussian Approximation
- IQR
So, why waste time let's dive into a practical approach and get our hands dirty.
Arbitrary Value Capping
1. Importing the Libraries & Data
Importing Libraries & Data |
2 Verifying Data.
Original Data |
3. Using Arbitrary Capping
Initializing Arbitrary capper |
4. Verifying the capped data
Once we have capped the values using the Arbitrary capper, we need to check the values.
Final Data |
We can notice here that the shape of our data has not changed and is the same as that of what we had begun. The difference here is the max and min age has been capped to the values that we specified while initializing the Arbitrary capper.
Winsorizing
1. Importing the Libraries & Data
Importing Libraries & Data |
2. Verifying the limits.
Original Data |
3. Performing Winsorization
Performing Winsorization |
4. Verifying the Transformation.
Final Data |
Final Data |
Comments
Post a Comment