Skip to main content

Feature Scaling -- Robust Scaling

 



Another technique in feature scaling is Robust Scaling, also known as Scaling to quantiles and median.

Robust scaling uses the Median and inter-quantile range for scaling the values of our dataset.

Quantiles can be defined as the cut points dividing the range of a probability distribution into continuous intervals with equal probabilities. Eg. 25th quantile, 75th quantile, 50th quantile. 

The Inter-Quantile Range can be defined as the difference between upper and lower quantiles.

Median is the middle value in a series when arranged in ascending or descending order.

The logic used here is to subtract the median from each value to reduce the overall median to 0 and divide the difference by the difference between the 75th quantile and 25th quantile.

Formula Used:- 

Robust Scaling Formula


Features of Robust Scaling:- 

1. Median is centred at 0:- 

Since the median value is subtracted from each value individually to scale the dataset thus it reduces and centres the median at 0 for each variable.

2. Variance varies across the variables:- 

In the whole process of scaling the dataset, we do not use variance at any point in time. Thus it changes according to the new scaled values from the dataset.

3. Shape of the Original Distribution is not preserved.

Data Distribution comparison before and after scaling

From the above graph, we can clearly notice that the shape of our original data distribution was changed and modified once the variables were scaled using the RobustScaling technique. Thus, we can say that the shape of the distribution is not preserved in the RobustScaling technique.

4. Minimum and Maximum values vary across the variables:- 

Description of dataset post scaling

Here we can see that the Minimum and Maximum values are different for different variables

5. Robust Outliers:- 

The outliers remain unchanged, i.e once the outlier is introduced in the dataset it remains even after the dataset is scaled using Robust scaling.


Practical:- 

Let's implement it by ourselves to understand it much better.

1.  Importing the necessities:-

Importing Libraries for Robust Scaling

2. Getting Data Insights

For getting any meaningful insights from the data we first need to be familiar with the data. i.e No. of Rows/Columns, Type of data, What that variable represents, their magnitude etc. etc.

To get a rough idea of our data we use the .head() method.

Boston Data Overview

To know in detail about the dataset, i.e what each variable represents we can use .DESCR()  method.

Description of Boston House Data

To further get the mathematical details from the data, we can use the .describe() method. 

Mathematical Description of Boston House Data

3. Scaling the Data

We will be using the RobustScaler method from skLearn for our data. 

Implementing Robust Scaling

That's it... Simple isn't it...!!! 

Now we can check the median & IQR values for each variable. 

Median & IQR value

4. Verifying the Scaling

To verify the end result first, we need to convert the scaled_data to a pandas dataframe.

Converting scaled data to dataframe

Since Robust scaling is based on the median, let's find out the median value for each variable before and after scaling.

Before Scaling:- 

Median before Scaling Data


After Scaling:- 

Median after Scaling Data

We can clearly notice here that the median for the scaled data has been reduced to 0. 

Also, let's check other values impacted by the scaling. 

Describing scaled data

Summary

We have studied Robust Scaling, a technique most commonly used for Feature Scaling. Here is a Quick Note on the technique:- 

1. Median is centred at 0.

2. Variance varies across the variables. 

3. Shape of the Original Distribution is not preserved.

4. Minimum and Maximum values vary across the variables.

5. Robust Outliers.

6. Mean is not centred at 0.


Happy Learning... !!

Comments