Skip to main content

Count Frequency Encoding


Introduction

The first method that is mostly used for Categorical Variable Encoding is "Count Frequency Encoding". This method is used to replace the categorical variable either with their count of values or the percentage share of the value in total space. 

Let's see an example to understand it better

Dummy Count Frequency Encoding
Dummy Count Frequency Encoding


Here we have created dummy data of 6 car companies and the colour of most selling cars on the left-hand side. While on the right-hand side we can see the list of the same cars but the Categorical Variable, i.e colour has been encoded using the Count Frequency Encoder, by both Count and Percentage. 

Since there were 2 companies, Tata and Jaguar having Grey as the most sold colour. Therefore, when encoding using count they both got the value 2, denoting that their value was repeated twice in the dataset and both had the same value.

Some Important Points

While we go ahead and perform Categorical Variable Encoding using Count Frequency Encoding, there are a few points that one should keep in mind:-

  • Before using this technique, we need to divide the dataset into train and test sets. 
  • Train this technique only over the train set.
  • Using this trained model, encode the values from both train and test sets.
  • This technique can be used for both Numerical and Categorical fields. 
  • In case, if some values are missing in the train set at the time of training the model and encountered in the test set, it will give an error for such values.

Advantages

  • This technique is quite simple to implement.
  • It does not expand the feature space.
  • Can be used for both Numerical and Categorical fields.

Disadvantages

  • In case two different values in a category appear the same number of times then both will be replaced by the same count. 
  • Replacing values with the same count may diminish the importance of variables.
Disadvantage of Count Frequency Encoding
Disadvantage Count Frequency Encoding


Practical


We will be using the feature-engine library of python for demo purposes.

1. Importing the Libraries


Importing Count Frequency Encoder & Data
Importing Count Frequency Encoder & Data


2. Viewing the Data


Dataset preview
Dataset preview

3. Initializing the Count Frequency Encoder


Initializing the Count Frequency Encoder
Initializing the Count Frequency Encoder

Here we have used "encoding_method" as "count", if we want to replace it with the frequency we can use 'frequency' in place of 'count'.  

4. Transforming using Count Frequency Encoder


Count Encoder transform
Count Encoder transform

We can notice here that there was a total of 577 males & 314 females. Now, these values will be used for encoding.

5. Verifying Data


Verifying Data
Verifying Data


Resources


Please comment below to get the complete dataset and libraries. 

Learn to install Anaconda Here.


Summary


In this Quick Reads, we studied a technique, Count Frequency Encoding that is commonly used for Categorical Variable Encoding. We had a quick overview of the technique, saw the positives and negatives of this technique and some quick Points to Remember. 

We also performed a practical demo of this technique using a famous python library "feature-engine". 

 Last but not least... "Practice Makes a Man One Perfect". So what are you waiting for practice this technique and comment below your views, doubts or anything? We are here to help you. 


 


Comments