Another technique in feature scaling is Robust Scaling, also known as Scaling to quantiles and median.
Robust scaling uses the Median and inter-quantile range for scaling the values of our dataset.
Quantiles can be defined as the cut points dividing the range of a probability distribution into continuous intervals with equal probabilities. Eg. 25th quantile, 75th quantile, 50th quantile.
The Inter-Quantile Range can be defined as the difference between upper and lower quantiles.
Median is the middle value in a series when arranged in ascending or descending order.
The logic used here is to subtract the median from each value to reduce the overall median to 0 and divide the difference by the difference between the 75th quantile and 25th quantile.
Formula Used:-
Robust Scaling Formula |
Features of Robust Scaling:-
1. Median is centred at 0:-
2. Variance varies across the variables:-
3. Shape of the Original Distribution is not preserved.
Data Distribution comparison before and after scaling |
4. Minimum and Maximum values vary across the variables:-
Description of dataset post scaling |
5. Robust Outliers:-
Practical:-
Let's implement it by ourselves to understand it much better.
1. Importing the necessities:-
Importing Libraries for Robust Scaling |
2. Getting Data Insights
For getting any meaningful insights from the data we first need to be familiar with the data. i.e No. of Rows/Columns, Type of data, What that variable represents, their magnitude etc. etc.
To get a rough idea of our data we use the .head() method.
Boston Data Overview |
To know in detail about the dataset, i.e what each variable represents we can use .DESCR() method.
Description of Boston House Data |
To further get the mathematical details from the data, we can use the .describe() method.
Mathematical Description of Boston House Data |
3. Scaling the Data
We will be using the RobustScaler method from skLearn for our data.
Implementing Robust Scaling |
That's it... Simple isn't it...!!!
Now we can check the median & IQR values for each variable.
Median & IQR value |
4. Verifying the Scaling
To verify the end result first, we need to convert the scaled_data to a pandas dataframe.
Converting scaled data to dataframe |
Since Robust scaling is based on the median, let's find out the median value for each variable before and after scaling.
Before Scaling:-
Median before Scaling Data |
After Scaling:-
Median after Scaling Data |
We can clearly notice here that the median for the scaled data has been reduced to 0.
Also, let's check other values impacted by the scaling.
Describing scaled data |
Summary
We have studied Robust Scaling, a technique most commonly used for Feature Scaling. Here is a Quick Note on the technique:-
1. Median is centred at 0.
2. Variance varies across the variables.
3. Shape of the Original Distribution is not preserved.
4. Minimum and Maximum values vary across the variables.
5. Robust Outliers.
6. Mean is not centred at 0.
Happy Learning... !!
Comments
Post a Comment