Skip to main content

Posts

Showing posts with the label scaling to vector unit length

Feature Scaling -- Scaling to Unit Length

  Let's see a more technical feature scaling method, that we can use for scaling our dataset. It is popularly known as "Scaling to Unit Length", as all the features are scaled down using a common value. Unlike previous methods that we have studied so far, used to scale the features based on some value specific to the variable, here all the variables are used to scale the features. Here,  the scaling is done row-wise to make the complete vector has a length of 1, i.e. normalisation procedure normalises the feature vector and not the observation vector.  Note:-  Scikit-learn recommends this scaling procedure for text classification or clustering. Formula Used:-  Scaling to Unit Length can be done using 2 different ways:-  1. Using L1 Norm:-  L1 Norm or popularly known as Manhattan Distance can be used to scale the datasets.  Scaling to Unit Length using Manhattan Distance where l1(x) can be calculated using the below formula. Manhattan Distance Formula 2. Using L2 Norm:- L2

Feature Scaling -- Maximum Absolute Scaling

  In previous articles, we read about Feature Scaling and two of the most important techniques used for feature scaling, i.e. Standardization  & MinMaxScaling . Here we will see another feature scaling technique that can be used to scale the variables and is somewhat similar to the MinMaxScaling technique. This technique is popularly known as MaxAbsScaling or Maximum Absolute Scaling . What is MaxAbsScaling? Maximum Absolute Scaling is the technique of scaling the data to its absolute maximum value. The logic used here is to divide each value by the Absolute Maximum Value for each variable/column. Doing so will scale down all the values between -1 to 1.  It can be implemented easily in a few lines of code, as shown below in the  practical section.   Note:- Scikit-learn recommends using this transformer on data that is centred at zero or on sparse data. Formula Used:-  MaxAbsScaling Formula Features of MaxAbsScaling:-  1.  Minimum and Maximum values are scaled between [-1,1]:-  Sin

Feature Scaling -- Min Max Scaling

  In our previous article, we read about Feature Scaling and the most common technique used to perform feature scaling i.e. Standardization .  Another important and commonly used technique is " Min-Max Scaling" or " Normalization". As the name suggests, Min-Max Scaling is the technique where the variables are scaled based on their Minimum and Maximum values.  Formula Used:-  Min Max Scaling Formula Unlike Standardization, mean is not used here. Rather the Minimum and Maximum values for each variable are used to find the new scaled value.  The logic used here is to subtract the Minimum value from each value and divide it by the difference between maximum and minimum values. Features of Min-Max Scaling 1. Mean is not centred at 0:-  Since in Min-Max scaling, we use the Minimum and Maximum values for scaling each variable separately thus, the mean may or may not get centred at 0. We can see this in the below example, where the mean for all variables is greater than 0

Feature Scaling

Let's begin with a famous saying... "Five Fingers are Never Equal".  Yes, we have heard it a lot, it's true in every case, even in Data Science and Machine Learning... The very first step in the journey of Data Science begins with Data Collection, and this is where we knowingly or unknowingly collect some data which are different in size, units etc. which makes the data vary and inconsistent data. In case we collect the vehicle data we might have the top speed in MPH, distance covered in KM, dimensions of the vehicle in CM/Inch, Model No. with no unit etc.  Sample data for Feature Scaling Thus, when we take this type of raw data and directly pass it through our Machine Learning Algorithms, it will give inconsistent results as the Machine understands no. only and not the units. So, it might give more weightage to the length of the car (1300mm) than the mileage of the car(30kmpl).   What is Feature Scaling? We have seen some quick info about the problem statement, now l