Imputation, seems to be a simple term, "Replacing Missing Data". Also, we have learned a lot many techniques to perform such Imputation in few lines of code. So, let me ask a question to you guys now.
Do you think in practical scenarios where we have very sensitive information like medical data, imputing some missing data based on some Random data would suffice? Will it impact the end analysis?
Before reading ahead, do think of the above question and try to answer it for yourself.
So, coming to the answer, there is a high probability that we might bias the dataset with some static value imputation. Imputation is never a simple job, it takes a lot of time and expertise to impute the correct values, even after that you can't be sure how your end model will perform and have you imputed the correct values. Thus, there was a need to devise a technique that could impute different plausible values and impute with the best one.
As of now, all the imputation techniques we saw were similar in a way. All of these techniques used a single value for Imputation. Now, we will be having look at the techniques which are used for Multi-Variate Imputation.
Comments
Post a Comment