Skip to main content

Posts

Showing posts with the label rare label encoding

Rare Label Encoding

  Introduction Till now we have seen many techniques for encoding the categorical variables, all having amazing capabilities and performance. But let me put up a question first before diving into another new technique.  Ques.:-   Suppose we have around 50 different values for a variable, a few having a very high frequency of representation and some with very little representation. Which technique are you going to use for encoding here and Why?  Please share your answers below in the comment section. Even if you don't know the correct answer, please give it a try. By engaging yourself you will definitely learn more. DO NOT MOVE AHEAD TILL YOU HAVE THOUGHT/COMMENTED ON AN ANSWER.  So now, continuing to our topic. Rare Label Encoding is a technique used to group values together and assign them under a common "Rare Label" if they have very little representation as compared to the other values. Let's have an example to understand it better. Suppose we have a dataset of 100