Setup Apache Spark and Initialize pyspark with ease

bigdata Nov 9, 2016

Recently I started working on Apache Spark and was having a hard time configuring spark on my machine. It became all try and catch while setting up the environment. Then I came across this gist which explains how to set up Apache Spark with IPython Notebook on MacOS. I love to work on IPython notebooks but most of the times I just need a simple ipython shell. So I skipped the IPython notebook portion of the setup.

After tweaking a bit I got the desired setup. I came up with the below bash alias to use pyspark with ipython -

alias ipyspark='PYSPARK_DRIVER_PYTHON=ipython /usr/local/bin/pyspark --master local[*] --driver-memory 2g'

You can read more about the above parameters in the documentation here and use the parameters as per your requirement.

Now comes the best part. Once you are inside ipython shell you would want to initialize pyspark as fast as possible. For that I found out findspark. Just install findspark using pip install findspark and start using pyspark with ease -

>>> import findspark
>>> findspark.init()
>>> import pyspark
>>> sc = pyspark.SparkContext(appName="myAppName")

Happy hacking!

Related Reads -

Getting started with spark? Check out this repo - http://jadianes.github.io/spark-py-notebooks

MOOC on edx.org - https://www.edx.org/course/big-data-analysis-apache-spark-uc-berkeleyx-cs110x

Recommended for you

Today I Learned

TIL - In Python concurrent.futures.as_completed Yields Tasks as Soon as They Finish

a month ago • 1 min read

Today I Learned

TIL - uuid.uuid5 Generate Deterministic Identifiers

a month ago • 1 min read

Today I Learned

TIL - Python's ast.literal_eval Is the Safe Alternative to eval()

a month ago • 1 min read

TIL - In Python concurrent.futures.as_completed Yields Tasks as Soon as They Finish

TIL - uuid.uuid5 Generate Deterministic Identifiers

TIL - Python's ast.literal_eval Is the Safe Alternative to eval()

TIL - Python's types.SimpleNamespace Gives Quick Dot-Notation Access

Setup Apache Spark and Initialize pyspark with ease

Tags

Mohammad Adil

Recommended for you

TIL - In Python concurrent.futures.as_completed Yields Tasks as Soon as They Finish

TIL - uuid.uuid5 Generate Deterministic Identifiers

TIL - Python's ast.literal_eval Is the Safe Alternative to eval()