Skip to main content

Posts

Showing posts with the label clustering

Bucketing in Hive

Today, we are dealing with a big problem of Big Data, where a huge amount of data is generated every second and minute. Thus, the issue of storing such a huge amount of data arises, which is managed using various SQL, NoSQL and now NewSQL databases. But still, a problem remains if we store the data as it is generated in our databases, it gets difficult to query such huge data. Thus, there was a need for some technique that could help in splitting the data at the time of storing, providing not only fast and easy access to data but also in easy storage. To cater for the issue of storing and managing Big Data, Hive was introduced, which further provides concepts like Partitioning and Bucketing to solve the issue of storing and querying huge datasets.