Logo AppDev24 Login / Sign Up
Sign Up
Have Login?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Login
New Account?
Recovery
Go to Login
By continuing you indicate that you agree to Terms of Service and Privacy Policy of the site.
Applications

Setup Apache Spark with Jupyter Notebook on MacOS

Updated on Jul 05, 2024

Are you interested in exploring the world of big data and machine learning? Look no further! In this article, we'll take you through a quick and easy guide to installing and configuring Apache Spark with Jupyter Notebook on your MacOS device.

Before we dive into the Apache Spark & Jupyter setup process, let's make sure we have the necessary prerequisites installed. We'll need Python 3, pip and Java.

Install Python3

The first step is to install Python 3 using Homebrew, a popular package manager for MacOS. Open your terminal and run the following command:

brew install python@3.11

Once installed, verify the version by running:

python3 -V

Install pip

Next, we need to install pip, the package installer for Python. Run the following command:

python3.11 -m pip install --upgrade pip

Install Java

Install Java using Homebrew. Open your terminal and run the following command:

brew install java

Once installed, verify the version by running:

java -version

openjdk version "22.0.1" 2024-04-16

Install Apache Spark

Next, we'll install Apache Spark itself using Homebrew.

brew install apache-spark

Install Scala

brew install scala@2.12
scala -version

Scala code runner version 2.12.19

Install pyspark

pip3 install pyspark
# pip3 install pyspark --break-system-packages
pyspark --version

version 3.5.1

Setup Environment Variables:

Depending upon your shell environment zsh or bash setup the below environment variables in ~/.zshrc or ~/.bashrc

# Java, Spark, pyspark
export JAVA_HOME=$(/usr/libexec/java_home)
export SPARK_HOME=/opt/homebrew/Cellar/apache-spark/3.5.1/libexec
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3

Apply the envionment variables in the current shell via source ~/.zshrc or source ~/.bashrc

Check Spark Version:

spark-submit --version
spark-shell --version
spark-sql --version

version 3.5.1

Test Apache Spark

To test our setup, let's create a simple Spark application using Python. Create a new file called spark_demo_app.py and add the following code:

spark_demo_app.py

from pyspark.sql import SparkSession, Row
from pyspark.sql.functions import avg, sum, count, max

# Initialize a Spark session
spark = SparkSession.builder.appName("SparkDemoApp").getOrCreate()

# Create a Dummy Dataframe
df = spark.createDataFrame([
    Row(name='Robert', loc='Berlin', sal=5000), Row(name='Peter', loc='Frankfurt', sal=6000),
    Row(name='Harry', loc='Dresden', sal=4000),Row(name='Sunny', loc='Berlin', sal=4800),
    Row(name='Sam', loc='Dresden', sal=3200),Row(name='Roger', loc='Berlin', sal=4700),
    ], schema = 'name string, location string, salary integer'
)

df.groupBy('location').agg(avg('salary').alias('avg_sal'),sum('salary').alias('sum_sal'),
    max('salary').alias('max_sal'), count('salary').alias("people_count")).show()

# Stop the Spark session
spark.stop()

Run this application using:

spark-submit spark_demo_app.py

Install Jupyter Lab

Finally, let's install Jupyter Lab to create and run notebooks:

brew install jupyterlab

Test Jupyter Lab

Create a new directory called ~/spark_notebooks and navigate into it. Then, start Jupyter Lab using:

mkdir ~/spark_notebooks
cd ~/spark_notebooks

# jupyter lab
jupyter lab --notebook-dir=~/spark_notebooks --preferred-dir ~/spark_notebooks

Access the server at http://localhost:8888/lab to create and run notebooks.

Add a new Notebook and Test the environment.

Jupyter Notebook
Jupyter Notebook

Spark UI is accessible at http://127.0.0.1:4040/

In this article, we've walked you through the process of setting up Apache Spark with Jupyter Notebook on MacOS. With these steps, you should now have a fully functional Spark environment ready for use. Happy coding!

PrimeChess

PrimeChess.org

PrimeChess.org makes elite chess training accessible and affordable for everyone. For the past 6 years, we have offered free chess camps for kids in Singapore and India, and during that time, we also observed many average-rated coaches charging far too much for their services.

To change that, we assembled a team of top-rated coaches including International Masters (IM) or coaches with multiple IM or GM norms, to provide online classes starting from $50 per month (8 classes each month + 4 tournaments)

This affordability is only possible if we get more students. This is why it will be very helpful if you could please pass-on this message to others.

Exclucively For Indian Residents: 
Basic - ₹1500
Intermediate- ₹2000
Advanced - ₹2500

Top 10 Articles