QuickDataScience | Quick & Easy Data Science

Posts

Showing posts with the label missing

Missing Indicator Imputation

Welcome back, friends..!! We are back with another imputation technique which is a bit different than the previous techniques we studied so far, & serves an important role that we knowingly/unknowingly have been skipping throughout the previous techniques. We studied many techniques like Mean/Median , Arbitrary Value , CCA , Missing Category , End of tail , Random Samples . If we notice all these techniques were good enough to Impute the Missing Values but the majority of them lacked to mark/flag the observations that were having values/and were imputed. Thus, we bring here the technique of Missing Indicator that was designed with the sole purpose of marking or denoting the observation that was/is having a missing value. This technique is mostly used together with one of the previously defined techniques for imputation. In simple terms, if we have to explain the technique, then in this technique we use another column/variable to maintain a flag(binary value 0/1, t...

Missing Category Imputation

Till now, we have seen imputation techniques that could only be used for Numerical variables but didn't say anything about the Categorical variables/column. So now, we are going to discuss a technique that is mostly used for imputing categorical variables. Missing Category Imputation is the technique in which we add an additional category for the missing value, as "Missing" in the variable/column. In simple terms we do not take the load of predicting or calculating the value(like we did for Mean/Median or End tail Imputation ), we simply put "Missing" as the value. Now, we may have a doubt that if we are only replacing the value with "Missing" then why it is said that this method can be used for Categorical variables only? Here is the answer, we can use it for Numerical variables also, since we can't introduce a categorical value in the Numerical variables/column, we will be required to introduce some Numerical value that is unique for the va...

End of Tail Imputation

End of Tail Imputation is another important Imputation technique. This technique was developed as an enhancement or to overcome the problems in the Arbitrary value Imputation technique. In the Arbitrary values Imputation method the biggest problem was selecting the arbitrary value to impute for a variable. Thus, making it hard for the user to select the value and in the case of a large dataset, it became more difficult to select the arbitrary value every time and for every variable/column. So, to overcome the problem of selecting the arbitrary value every time, End of Tail Imputation was introduced where the arbitrary value is automatically selecting arbitrary values from the variable distributions. Now the question comes How do we select the values? & How to Impute the End value? There is a simple rule to select the value given below:- In the case of normal distribution of the variable, we can use Mean plus/minus 3 times the standard deviation. In the case variab...