Skip to main content

Spark — How to install in 5 Steps in Windows 10



 An easy to go guide for installing the Spark in Windows 10.



Image taken from Google images

1. Prerequisites

  1. Hardware Requirement
    * RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work.
    * CPU — Min. Quad-core, with at least 1.80GHz
  2. JRE 1.8  Offline installer for JRE 
  3. Java Development Kit — 1.8 
  4. A Software for Un-Zipping like 7Zip or Win Rar
    * I will be using 64-bit windows for the process, please check and download the version supported by your system x86 or x64 for all the software.
  5. Hadoop
    * I am using Hadoop-2.9.2, you can also use any other STABLE version for Hadoop. 
    * If you don’t have Hadoop, you can refer to installing it from Hadoop: How to install in 5 Steps in Windows 10.
  6. MySQL Query Browser
  7. Download Spark Zip
    * I am using Spark 3.1.1, you can also use any other STABLE version for Spark.
    * Latest release of Spark is 3.1.2(shown in the image below) released in June'21


Fig 1:- Download Spark-3.1.2



2. Unzip and Install Spark



After Downloading the Spark, we need to Unzip the spark-3.1.1-bin-hadoop2.7.gz file.

Fig 2:- Extracting Spark Step-1



Once extracted, we would get a new file spark-3.1.1-bin-hadoop2.7.tar
Now, once again we need to extract this tar file.


Fig 3:- Extracting Spark Step-2




Now we can organize our Spark installation, we can create a folder and move the final extracted file in it. For Eg. :-

Fig 4:- Spark Directory



  • Please note while creating folders, DO NOT ADD SPACES IN BETWEEN THE FOLDER NAME.(it can cause issues later)
  • I have placed my Spark in D: drive you can use C: or any other drive also.


3. Setting Up Environment Variables


Another important step in setting up a work environment is to set your Systems environment variable.

To edit environment variables, go to Control Panel > System > click on the “Advanced system settings” link
Alternatively, We can Right-click on This PC icon and click on Properties and click on the “Advanced system settings” link
Or, the easiest way is to search for Environment Variable in the search bar and there you GO…😉

Fig. 5:- Path for Environment Variable


Fig. 6:- Advanced System Settings Screen


3.1 Setting SPARK_HOME


  • Open environment Variable and click on “New” in “User Variable”


Fig. 7:- Adding Environment Variable



  • On clicking “New”, we get the below screen.



Fig. 8:- Adding SPARK_HOME



  • Now as shown, add SPARK_HOME in variable name and path of Spark in Variable Value.
  • Click OK and we are half done with setting SPARK_HOME.


3.2 Setting Path Variable

  • The last step in setting the Environment variable is setting Path in System Variable.



Fig. 9:- Setting Path Variable


  • Select Path variable in the system variables and click on “Edit”.


Fig. 10:- Adding Path


  • Now we need to add these paths to Path Variable:-
    * %SPARK_HOME%\bin
  • Click OK and OK. & we are done with Setting Environment Variables.


3.3 Verify the Paths 

  • Now we need to verify that what we have done is correct and reflecting.
  • Open a NEW Command Window
  • Run following commands

echo %SPARK_HOME%



Note:- If you want the path to be set for all users you need to select “New” from System Variables.



4. Configure without Hadoop


Many of us think Hadoop installation is necessary for running Spark which becomes another TASK to install Hadoop and then Spark. 


Wait... Here is a special catch for you... Noticed the Hadoop part present in your Spark downloaded file..!!!

spark-3.1.1-bin-hadoop2.7.gz

We can actually run Spark without installing Hadoop by just adding 1 file (winutils.exe) in the bin folder of Spark. 

Download winutil.exe for your respective Hadoop version and put the file in the Spark bin folder. 

Please note:- winutils.exe are different for each Hadoop version, do take special care while downloading the winutils.exe file.



5. Launch Spark


Congratulation..!!!!!
We are done with setting up the Spark in our System.

Now we need to check if everything works smoothly…

Open a cmd window, move to the %SPARK_HOME%/bin directory & run the below command to test the connection and Spark.

Changing the directory:-  

cd %SPARK_HOME%/bin


Starting Spark Shell:-

spark-shell


Fig 11:- Running Spark


Spark also offers a WebUI by default which we can access at http://localhost:4040

Fig12:- Spark Web UI




6. Congratulations..!!!!🎉


Congratulation! We have successfully installed Spark.

There are chances that some of us might have faced some issues… Don’t worry it's most likely due to some small miss or incompatible software. If you face any such issue, please visit all the steps carefully and verify the right software versions.

If you still can’t get Spark up and running, Don’t hesitate to describe your problem below in the comment section.



 

Comments

  1. Thanks to share ur knowledge. Its simple.but hadoop winutils also highly recommended. just you installed hadoop
    Thanks & Regards
    Venu
    spark training in Hyderabad

    ReplyDelete

Post a Comment