In previous articles, we read about Feature Scaling and two of the most important techniques used for feature scaling, i.e. Standardization & MinMaxScaling.
Here we will see another feature scaling technique that can be used to scale the variables and is somewhat similar to the MinMaxScaling technique. This technique is popularly known as MaxAbsScaling or Maximum Absolute Scaling.
What is MaxAbsScaling?
Formula Used:-
MaxAbsScaling Formula |
Features of MaxAbsScaling:-
1. Minimum and Maximum values are scaled between [-1,1]:-
2. Mean is not centred at 0:-
This method does not use or consider the mean for scaling, thus the mean does not get centred around any particular value (eg. 0). But chances are it can get centred at 0 for a few of the variables, depending on the variable distribution.
3. Variance varies across the variables:-
4. Sensitive to Outliers:-
5. May not preserve the Original shape of Distribution.
As we have already discussed, if any randomly high outlier is used as the Maximum value then the shape of the original distribution gets disturbed. We can also see the same in the graph presented in Practical Section.
Practical:-
Let's implement it by ourselves to understand it much better
1. Importing the necessities:-
Importing libraries for MaxAbs Scaling |
2. Getting Data Insights
For getting any meaningful insights from the data we first need to be familiar with the data. i.e No. of Rows/Columns, Type of data, What that variable represents, their magnitude etc. etc.
To get a rough idea of our data we use the .head() method.
Boston Data Overview |
To know in detail about the dataset, i.e what each variable represents we can use .DESCR() method.
Description of Boston House Data |
To further get the mathematical details from the data, we can use the .describe() method.
Mathematical Description of Boston House Data |
3. Scaling the Data
We will be using the MaxAbsScaler method from skLearn for our data.
Implementing MaxAbs Scaling |
That's it... Simple isn't it...!!!
Now we can check the absolute maximum values for each variable.
Maximum Absolute Value for each variable |
4. Verifying the Scaling
Converting scaled data to dataframe |
Next, we need to verify if the data has been scaled or not. For which we need to use the .describe() method again.
Scaled Data:-
Describing scaled data |
We can notice here that the maximum value for each variable has been fixed to 1.0 whereas the Minimum value is around 0 for each variable.
Original Data:-
Describing Original data |
Let's have a look at the distribution graph before and after scaling.
MaxAbs Data Distribution |
Summary
We have studied Maximum Absolute Scaling, a technique most commonly used for Feature Scaling. Here is a Quick Note on the technique:-
1. Minimum and Maximum values are scaled between [-1,1]
2. Mean is not centred at 0.
3. Variance varies across the variables.
4. Sensitive to Outliers.
5. May not preserve the Original shape of Distribution.
6. Can be used with other techniques to centre the mean at 0.
Happy Learning... !!
Comments
Post a Comment