Skip to main content

Posts

Showing posts with the label Multiple Imputation

Multi-variate Imputation of Chained Equation

We have already studied many techniques used for Missing Data Imputation . The majority of these techniques , that we studied, are or can be used in our final production-ready model. But when it comes to imputing something then there is always a chance of getting it better cause we are never sure if the values imputed are correct or not. Thus, to improve the imputation, we use Multiple imputations , i.e using more than one way to predict the values and then taking average or any other way to get the best suitable value.  We have already seen a technique using similar logic, i.e. KNN Imputation , that uses the K-Nearest Neighbour Algorithm to find the best suitable value. These techniques are better known as " Multi-Variate Imputation ". Now, we would like to introduce you to a newer and better technique, which has now become a principal technique for Missing Data Imputation, known as MICE(Multi-variate Imputation of Chained Equation).  Multi-variate Imputation of Chained Equa

Multiple Imputation

Imputation, seems to be a simple term, "Replacing Missing Data". Also, we have learned a lot many techniques to perform such Imputation in few lines of code. So, let me ask a question to you guys now.  Do you think in practical scenarios where we have very sensitive information like medical data, imputing some missing data based on some Random data would suffice? Will it impact the end analysis?  Before reading ahead, do think of the above question and try to answer it for yourself.  So, coming to the answer, there is a high probability that we might bias the dataset with some static value imputation. Imputation is never a simple job, it takes a lot of time and expertise to impute the correct values, even after that you can't be sure how your end model will perform and have you imputed the correct values. Thus, there was a need to devise a technique that could impute different plausible values and impute with the best one.  As of now, all the imputation techniques we saw

KNN Imputation

Talking about Multi-variate Imputation, one of the techniques that are very common and familiar to every data scientist is the KNN Impute. Though KNN Impute might be a new term, KNN is not a new term and is familiar to everyone related to this field. Even if it is a new term for you, don't worry we have defined it for you in the next section.  Let's define the KNN and make it familiar to the new aspirants.