Skip to main content

Posts

Showing posts with the label variable discretisation

Discretisation -- Decision Tree

  One of the most favourite processes in Data Science is using Tree-based algorithms to find & predict the values. Trees are so popular in this field because they work over binary answers {Yes: No, 1: 0, True: False}, and when we can provide a clear difference between Yes or No, it becomes easy to better analyse things.  Thus, in the field of Data, when we have way too much inflow of data, it is always preferred to get cut to point answers to the questions.  So, why not use the same technique with Discretisation...!!! 

Discretisation -- Equal Frequency

  We have studied a few techniques commonly used for the process of Discretisation or binning. We are here to discuss another important technique that we can use for binning is -- dividing the data into equal size groups, i.e. total data is divided into groups/bins each containing an equal amount of data.  The important part here to note is that widths of each bin may defer in this case, i.e. one bin can be 0-5 and another might be of size 70-100. 

Discretisation -- Equal Widths

  Discretisation or Binning, the process of dividing the data into equal intervals or bins. Yes, we have studied this explanation but how we can do it?  Still, this question keeps running into our minds... Sit back and relax we have got you covered...  Here we are going to learn simple and easy techniques that we use in our daily life also -- dividing the data into equal intervals. i.e we divide the into N-equal groups of the same width(gaps).  Equal Width Binning Example    Let's understand it better using the above example(image). Here we had values ranging from 0-300, quite a large width or range for doing any analysis and visualisation. Thus, we decided to divide the data into equal widths of 20(bins of 20), i.e 0-20,21-40,41-60* and so on. * We used 21,41,61,... because we wanted to make clear that range is inclusive of the upper limit. Doing so we were able to group the data into 15 bins which not only made it easy to visualize but also helps in analysing the data better and

Discretisation

  The process of converting analogue or continuous variables/data into discrete variables/data is known as Discretisation.  The discretisation is the process of transforming continuous variables into discrete variables by creating a set of contiguous intervals that span the range of the variable's values. The discretisation is also called binning , where the bin is an alternative name for an interval. Not Sure why we are introducing and referring to this term here..?? Let's find it out in the next section.