Skip to main content

Posts

Showing posts with the label hive clustered by

Bucketing in HIVE - Learning by Doing

< Previous   Bucketing in Hive We studied the theory part involved in Bucketing in Hive in our previous article. Time to get our hands dirty now.  We will be following the below pattern for the Coding part:-  1. Hadoop Installation. 2. Hive Installation.   Hope we have installed, and have Hadoop and Hive running. As already discussed, there are two(2) ways to performing Bucketing. We will be discussing code for both in detail separately.   We will be using the "World Happiness" dataset for demonstrating the Bucketing. 1. Bucketing with Partitioning A. First, we need to create a table and load data into it. CREATE TABLE IF NOT EXISTS <Table Name> ( <Column1 DataType>, <Column2 DataType>, <Column3 DataType>, ... ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS <file format>; Creating Table in Hive B. Now we need to LOAD the entire dataset into our table.  LOAD DATA LOCAL INPATH <File Path> INTO TABLE <Table Name>; L