Machine Learning is an essential term in the field of Data Science. We all have read and worked with different algorithms and might have also created many models for our projects. Today, I would like to introduce you to a new way of Machine Learning -- Online Machine Learning.
What it is?
Wait..!! I recently deployed my model over the server. Isn't it Online?
Yes, your doubt is absolutely right. You might have deployed your model but it might not be learning online, and a model that can learn and train itself after being deployed is considered Online.
So What..!!! My model does it, I have configured it to learn about the new data periodically.
Great... that you have deployed such a model and I do hope that it's working fine, but sorry to say that's still not What we call Online.
Then What do you call Online Machine Learning?? Enough of going round and round. Get to the point.
Yes yes, just a second I will explain everything about it in some time but before that, I would like to share a case with you that will make you all understand it better.
A Short Story
Let's suppose we have to build a News aggregator website that won't store any news rather it will read the news from various news agencies and publications sites and will display it based on the user preference. This user preference is nothing but a simple Machine Learning Algo, which will take the User Preferences plus the latest trend to show the news.
This might seems to be a simple problem with a simple recommender based algo. which will takes the news published each day and train it on the data collected each day from the news channels. But there is a catch suppose we trained our model on previous day data and suddenly the next day war started between two nations, but our site which was trained on previous day data is still showcasing the old news without any hint that a war has started and people are eagerly looking for it, and ignoring our site because we don't have any news of their interest.
Thus, to overcome this issue we can do a small change, i.e. to train the model on fresh data, i.e. present day's newly generated day. But what frequency we should update it?
A simple answer to this is that we should not worry about the frequency or the no. of batches required for the update. Rather we should update it at each query, i.e. as soon as we encounter any new query we should train our model, with each new query system should train itself then and there only.
We are saying batch but it is very different from batch machine learning.
Online Vs Batch Learning
In a Batch update approach used for Machine Learning, the usual process is to make a batch of data and use it for training the model at particular time intervals, i.e. mostly the data is collected throughout the day and when the traffic is less on the site, this data is used for training the model. Whereas in the Online
The basic difference between Online and Batch Machine Learning can be seen below:-
Online VS Batch Learning
Conclusion
Online Machine Learning also known as "Stream Learning" or "Incremental Learning" is become a big name in Machine Learning with the increasing interest of businesses in the streaming systems. Such as:
Businesses want to be more up to date and more disciplined concerning time. Thus, switching to streaming systems is the option that is mostly preferred.
The data created by the businesses is huge and can be easily domesticated as the system is specifically designed to handle a particular set of data.
Processing the data as it comes in spreads the workload evenly over time, and yields consistent results.
Comments
Post a Comment