Spark — How to install in 5 Steps in Windows 10

An easy to go guide for installing the Spark in Windows 10.

Image taken from Google images

1. Prerequisites

Hardware Requirement
* RAM — Min. 8GB, if you have SSD in your system then 4GB RAM would also work.
* CPU — Min. Quad-core, with at least 1.80GHz
JRE 1.8 — Offline installer for JRE
Java Development Kit — 1.8
A Software for Un-Zipping like 7Zip or Win Rar
* I will be using 64-bit windows for the process, please check and download the version supported by your system x86 or x64 for all the software.
Hadoop
* I am using Hadoop-2.9.2, you can also use any other STABLE version for Hadoop.
* If you don’t have Hadoop, you can refer to installing it from Hadoop: How to install in 5 Steps in Windows 10.
MySQL Query Browser
Download Spark Zip
* I am using Spark 3.1.1, you can also use any other STABLE version for Spark.
* Latest release of Spark is 3.1.2(shown in the image below) released in June'21

Fig 1:- Download Spark-3.1.2

2. Unzip and Install Spark

After Downloading the Spark, we need to Unzip the spark-3.1.1-bin-hadoop2.7.gz file.

Fig 2:- Extracting Spark Step-1

Once extracted, we would get a new file spark-3.1.1-bin-hadoop2.7.tar
Now, once again we need to extract this tar file.

Fig 3:- Extracting Spark Step-2

Now we can organize our Spark installation, we can create a folder and move the final extracted file in it. For Eg. :-

Fig 4:- Spark Directory

Please note while creating folders, DO NOT ADD SPACES IN BETWEEN THE FOLDER NAME.(it can cause issues later)
I have placed my Spark in D: drive you can use C: or any other drive also.

3. Setting Up Environment Variables

Another important step in setting up a work environment is to set your Systems environment variable.

To edit environment variables, go to Control Panel > System > click on the “Advanced system settings” link
Alternatively, We can Right-click on This PC icon and click on Properties and click on the “Advanced system settings” link
Or, the easiest way is to search for Environment Variable in the search bar and there you GO…😉

Fig. 5:- Path for Environment Variable

Fig. 6:- Advanced System Settings Screen

3.1 Setting SPARK_HOME

Open environment Variable and click on “New” in “User Variable”

Fig. 7:- Adding Environment Variable

On clicking “New”, we get the below screen.

Fig. 8:- Adding SPARK_HOME

Now as shown, add SPARK_HOME in variable name and path of Spark in Variable Value.
Click OK and we are half done with setting SPARK_HOME.

3.2 Setting Path Variable

The last step in setting the Environment variable is setting Path in System Variable.

Fig. 9:- Setting Path Variable

Select Path variable in the system variables and click on “Edit”.

Fig. 10:- Adding Path

Now we need to add these paths to Path Variable:-
* %SPARK_HOME%\bin
Click OK and OK. & we are done with Setting Environment Variables.

3.3 Verify the Paths

Now we need to verify that what we have done is correct and reflecting.
Open a NEW Command Window
Run following commands

echo %SPARK_HOME%

Note:- If you want the path to be set for all users you need to select “New” from System Variables.

4. Configure without Hadoop

Many of us think Hadoop installation is necessary for running Spark which becomes another TASK to install Hadoop and then Spark.

You can refer Here for Hadoop: How to install in 5 Steps in Windows 10.

Wait... Here is a special catch for you... Noticed the Hadoop part present in your Spark downloaded file..!!!

spark-3.1.1-bin-hadoop2.7.gz

We can actually run Spark without installing Hadoop by just adding 1 file (winutils.exe) in the bin folder of Spark.

Download winutil.exe for your respective Hadoop version and put the file in the Spark bin folder.

Please note:- winutils.exe are different for each Hadoop version, do take special care while downloading the winutils.exe file.

5. Launch Spark

Congratulation..!!!!!
We are done with setting up the Spark in our System.

Now we need to check if everything works smoothly…

Open a cmd window, move to the %SPARK_HOME%/bin directory & run the below command to test the connection and Spark.

Changing the directory:-

cd %SPARK_HOME%/bin

Starting Spark Shell:-

spark-shell

Fig 11:- Running Spark

Spark also offers a WebUI by default which we can access at http://localhost:4040

Fig12:- Spark Web UI

6. Congratulations..!!!!🎉

Congratulation! We have successfully installed Spark.

There are chances that some of us might have faced some issues… Don’t worry it's most likely due to some small miss or incompatible software. If you face any such issue, please visit all the steps carefully and verify the right software versions.

If you still can’t get Spark up and running, Don’t hesitate to describe your problem below in the comment section.

Learn Hadoop Installation in 5 steps

Learn Hive Installation in 5 steps

Learn Pig Installation in 5 steps

Learn Sqoop Installation in 5 steps

Happy Learning... 😊

QuickDataScience | Quick & Easy Data Science

Search This Blog