QuickDataScience | Quick & Easy Data Science

Posts

Showing posts from April, 2022

Can Machines be Racists...???

Being a Racist or judging others based on their skin colour, hair, region, mother tongue, and each and every little action is considered to be the sole department of Humans only, Until now when a Machine created by Humans also started to behave like them and judging others based on these attributes. Yes..!! you heard it right. A machine designed to give ethical advice gave inappropriate responses. What is it? You might also have the same question in your mind... What is it? or What are you talking about? So let's try to answer this question first and then proceed further with other things. Many times it happens that we feel alone and hope for some good advice from our friends, seniors, parents etc.. but this is not possible all the time, finding a piece of good advice from someone you can trust is not possible. Thus, what else can be a better replacement for it than a machine, that can listen to our problems and can answer ethically just like someone whom we trust. Therefor

Feature Scaling -- Scaling to Unit Length

Let's see a more technical feature scaling method, that we can use for scaling our dataset. It is popularly known as "Scaling to Unit Length", as all the features are scaled down using a common value. Unlike previous methods that we have studied so far, used to scale the features based on some value specific to the variable, here all the variables are used to scale the features. Here, the scaling is done row-wise to make the complete vector has a length of 1, i.e. normalisation procedure normalises the feature vector and not the observation vector. Note:- Scikit-learn recommends this scaling procedure for text classification or clustering. Formula Used:- Scaling to Unit Length can be done using 2 different ways:- 1. Using L1 Norm:- L1 Norm or popularly known as Manhattan Distance can be used to scale the datasets. Scaling to Unit Length using Manhattan Distance where l1(x) can be calculated using the below formula. Manhattan Distance Formula 2. Using L2 Norm:- L2

Bot With A Life

The Logic Behind:- The concept of designing a bot came from a very simple and common thing, i.e Paper Making. Confusing... !!!Let's explain it and make it a bit simple. 'A book is made of wood. But it is not a tree. The dead cells have been repurposed to serve another need.' Here, one living cell is adapted for use for a different purpose, i.e the cells from a "Living" tree are changed and adapted to form a page, which is then used to form a book. A similar logic was used by the scientists from the University of Vermont, who tried to use the cells from one living organism to create a robot that can be controlled and alive at the same time.

Feature Scaling -- Robust Scaling

Another technique in feature scaling is Robust Scaling, also known as Scaling to quantiles and median. Robust scaling uses the Median and inter-quantile range for scaling the values of our dataset. Quantiles can be defined as the cut points dividing the range of a probability distribution into continuous intervals with equal probabilities. Eg. 25th quantile, 75th quantile, 50th quantile. The Inter-Quantile Range can be defined as the difference between upper and lower quantiles. Median is the middle value in a series when arranged in ascending or descending order. The logic used here is to subtract the median from each value to reduce the overall median to 0 and divide the difference by the difference between the 75th quantile and 25th quantile. Formula Used:- Robust Scaling Formula Features of Robust Scaling:- 1. Median is centred at 0:- Since the median value is subtracted from each value individually to scale the dataset thus it reduces and centres the median at 0 for each

Feature Scaling -- Maximum Absolute Scaling

In previous articles, we read about Feature Scaling and two of the most important techniques used for feature scaling, i.e. Standardization & MinMaxScaling . Here we will see another feature scaling technique that can be used to scale the variables and is somewhat similar to the MinMaxScaling technique. This technique is popularly known as MaxAbsScaling or Maximum Absolute Scaling . What is MaxAbsScaling? Maximum Absolute Scaling is the technique of scaling the data to its absolute maximum value. The logic used here is to divide each value by the Absolute Maximum Value for each variable/column. Doing so will scale down all the values between -1 to 1. It can be implemented easily in a few lines of code, as shown below in the practical section. Note:- Scikit-learn recommends using this transformer on data that is centred at zero or on sparse data. Formula Used:- MaxAbsScaling Formula Features of MaxAbsScaling:- 1. Minimum and Maximum values are scaled between [-1,1]:- Sin

Feature Scaling -- Min Max Scaling

In our previous article, we read about Feature Scaling and the most common technique used to perform feature scaling i.e. Standardization . Another important and commonly used technique is " Min-Max Scaling" or " Normalization". As the name suggests, Min-Max Scaling is the technique where the variables are scaled based on their Minimum and Maximum values. Formula Used:- Min Max Scaling Formula Unlike Standardization, mean is not used here. Rather the Minimum and Maximum values for each variable are used to find the new scaled value. The logic used here is to subtract the Minimum value from each value and divide it by the difference between maximum and minimum values. Features of Min-Max Scaling 1. Mean is not centred at 0:- Since in Min-Max scaling, we use the Minimum and Maximum values for scaling each variable separately thus, the mean may or may not get centred at 0. We can see this in the below example, where the mean for all variables is greater than 0