An easy to go guide for installing Hive in Windows 10.
Image taken from Google images |
1. Prerequisites
- Hardware Requirement
* RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work.
* CPU — Min. Quad-core, with at least 1.80GHz - JRE 1.8 — Offline installer for JRE
- Java Development Kit — 1.8
- A Software for Un-Zipping like 7Zip or Win Rar
* I will be using 64-bit windows for the process, please check and download the version supported by your system x86 or x64 for all the software. - Hadoop
* I am using Hadoop-2.9.2, you can also use any other STABLE version for Hadoop.
* If you don’t have Hadoop, you can refer to installing it from Hadoop: How to install in 5 Steps in Windows 10. - MySQL Query Browser
- Download Hive zip
* I am using Hive-3.1.2, you can also use any other STABLE version for Hive.
2. Unzip and Install Hive
- After Downloading the Hive, we need to Unzip the apache-hive-3.1.2-bin.tar.gz file.
Fig 2:- Extracting Hive Step-1 |
- Once extracted, we would get a new file apache-hive-3.1.2-bin.tar
Now, once again we need to extract this tar file.
Fig 3:- Extracting Hive Step-2 |
- Now we can organize our Hive installation, we can create a folder and move the final extracted file in it. For Eg. :-
Fig 4:- Hive Directory |
- Please note while creating folders, DO NOT ADD SPACES IN BETWEEN THE FOLDER NAME.(it can cause issues later)
- I have placed my Hive in D: drive you can use C: or any other drive also.
3. Setting Up Environment Variables
Another important step in setting up a work environment is to set your Systems environment variable.
To edit environment variables, go to Control Panel > System > click on the “Advanced system settings” link
Alternatively, We can Right-click on This PC icon and click on Properties and click on the “Advanced system settings” link
Or, the easiest way is to search for Environment Variable in the search bar and there you GO…😉
Fig. 5:- Path for Environment Variable |
Fig. 6:- Advanced System Settings Screen |
3.1 Setting HIVE_HOME
- Open environment Variable and click on “New” in “User Variable”
- On clicking “New”, we get the below screen.
Fig. 8:- Adding HIVE_HOME
|
- The last step in setting the Environment variable is setting Path in System Variable.
- Select Path variable in the system variables and click on “Edit”.
- Now we need to add these paths to Path Variable:-
* %HIVE_HOME%\bin - Click OK and OK. & we are done with Setting Environment Variables.
3.3 Verify the Paths
- Now we need to verify that what we have done is correct and reflecting.
- Open a NEW Command Window
- Run following commands.
4. Editing Hive
4.1 Replacing bins
- Go to this GitHub Repo and download the bin folder as a zip.
- Extract the zip and replace all the files present under the bin folder to %HIVE_HOME%\bin
Note:- If you are using different version of HIVE then please search for its respective bin folder and download it.
4.2 Creating File Hive-site.xml
(We can find these files in Hive -> conf -> hive-default.xml.template)
We need to copy the hive-default.xml.template file and paste it in the same location and rename it to hive-site.xml. This will act as our main Config file for Hive.
Fig. 11:- Creating Hive-site.xml4.3 Editing Configuration Files4.3.1 Editing the Properties |
Now Open the newly created Hive-site.xml and we need to edit the following properties
<Your IP Address>
with the IP Address of your System and replace <Your drive Folder>
with the Hive folder Path.4.3.2 Removing Special Characters
4.3.3 Adding few More Properties
4.4 Creating Hive User in MySQL
These Users are used for connecting Hive to MySQL Database for reading and writing data from it.
Note:- You can skip this step if you have created the hive user while SQOOP installation.
- Firstly, we need to open the MySQL Workbench and open the workspace(default or any specific, if you want). We will be using the default workspace only for now.
Fig 12:- Open MySQL Workbench |
- Now Open the
Administration
option in the Workspace and selectUsers and privileges
option underManagement.
Fig 13:- Opening Users and Privileges |
- Now select
Add Account
option and Create an new user withLogin Name
ashive
andLimit to Host Mapping
as thelocalhost
andPassword
of your choice.
- Upon opening it will ask for your
root
user password(created while setting up MySQL). - Now we need to run the below command in the cmd window.
test_bigdata
will be your schema name and hive@localhost
will be the user name @ Hostname.4.6 Creating Metastore
Firstly, we need to create a database for metastore in MySQL OR we can use the one which used in the previous step test_bigdata
in my case.
Now Navigate to the below path
hive -> scripts -> metastore -> upgrade -> mysql
and execute the file hive-schema-3.1.0.mysql
in MySQL in your database.
Note:- If you are using a different Database, select the folder for the same inupgrade
folder and execute thehive-schema
file.
4.7 Adding Few More Properties(Metastore related Properties)
hive-site.xml
file once again and make some changes their, these are related to Hive metastore that’s why did not add them in starting to distinguish between the different set of properties.5. Starting Hive
5.1 Starting Hadoop
Fig. 19:- start-all.cmd |
5.2 Starting Hive Metastore
5.3 Starting Hive
6. Common Issues
6.1 Unable to export or import data in hive
The 1st common issue that we face after starting Hive is that we are unable to import Or Export
Sol:- We need to edit the below property and make it false.
6.2 Join not Working
7. Congratulations..!!!!🎉
Congratulation! We have successfully installed Hive.
There are chances that some of us might have faced some issues… Don’t worry it's most likely due to some small miss or incompatible software. If you face any such issue please visit all the steps once again carefully and verify for the right software versions.
If you still can’t get Hive up and running, Don’t hesitate to describe your problem below in the comment section.
Learn Hadoop Installation in 5 steps
Learn SQOOP Installation in 5 steps
Learn Pig Installation in 5 steps
Happy Learning… !!! 🙂
Hi,
ReplyDeleteI installed Hive but while opening hive shell I am getting error as : Connection timed out.(Cannot connect to metastore)
Can you please help here as to how I should correct this.
Hi,
DeleteThis seems to be a configuration issue, can you please check the IP that Hive Metastore is using for connection and the IP present in your hive-site.xml file.
You can find IP under this property in hive-site.xml
Property:- hive.metastore.uris
Value:- thrift://YOUR IP ADDRESS:9083
and the IP that metastore is using will be found in the STARTUP MSG
STARTUP_MSG: Starting HiveMetaStore
STARTUP_MSG: host = YOUR IP ADDRESS
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.1.2
try changing the IP in the hive-site.xml to the IP that metastore is using.. this should resolve your problem.
Great! It worked. I can open hive shell now.
ReplyDeleteThank you so much.
Thanks Ma'am/Sir
DeleteHi Team,
ReplyDeleteI have created one table in Hive with table definition as below:
Create external table student( name string, age int, gender string) row format delimited fields terminated by '|' lines terminated by '\n' Location '/usr/shash';
And thereafter I have loaded data in above table as below:
LOAD DATA LOCAL INPATH 'F:/TestingData/External_legato.txt' INTO table student;
And data in my file is as below:
|VSP|26|M|
|PS|26|M|
|TG|26|F|
After running the command select * from student I am getting result as below:
NULL 26
NULL 26
NULL 26
Can you please help here as why am I getting result like this.
Hi Ma'am/Sir,
DeleteThe result that you are getting is not as you expected because of the extra delimiter used in start of the data. i.e. a '|' symbol in the starting.
Now, let's explain what is happening here and why we are not getting the desired result.
As the hive reads the data, it first encounters a '|' symbol and assigns the data preceding the symbol to first column(in this case since there is no data so it assigns an empty string to first column i.e. name)
On next delimiter it gets the data i.e. VSP, PS, TG but since the table has an int column for 2nd field, so it is unable to convert a string to int field and hence assigns NULL.
Similarly, for the 3rd column it gets Age i.e. 26 in our case and since the table has string column and the value we received from data is int, which can be easily converted to string so it assigns 26.
Now, since the table has only 3 columns and it has assigned all the columns with some value, it skips to next row.
Hope, that was clear. Please let us know in case of further doubts.
Thank you for this great explanation .I tried with other datasets as well. It's working the way you explained.
ReplyDeleteGreat to know that.... Thanks.
DeleteDo let us know what is the next topic you would like to read.
Hi Shashank, I installed hive when use command start-all.cmd, it works but when i start command "hive --service metastore" and "hive". It show message( 'hive' is not recognized as an internal or external command, operable program or batch file).
ReplyDeleteHi Thanikaivel, Can you please share the screenshots and the configurations for your setup.
DeleteYou can mail the details at:- quickdatascienceds@gmail.com