Apache Spark is a fast and general-purpose cluster computing system.
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
As part of your weekend agenda, read these articles, as many as you can. You might even decide to follow one through to the end. But that's up to you.
Spark is perhaps the most important single piece of tech you'll use for the next few years. Understanding its applicability, architecture, and use will differentiate you from many Data people who are not strong in it.
Scan over all the articles quickly (2-3 min per article) first. Jot down notes on what seems to be in each article. Then, looking over your notes on eacha rticle, read in the order that seems most natural to you. By repetition, reading the same terms used by different authors, helps you to catalog in your own mind your understanding of how the network of concepts comes together to form your own expertise.
- High Level Overview of Apache Spark
- Why We Need Apache Spark
- A Neanderthal’s Guide to Apache Spark in Python
- Apache Spark and Amazon S3 — Gotchas and best practices (I learned a few things from this one. One of my personal projects right now is using lots of S3)
- Apache Spark in Python: Beginner’s Guide
- SQL at Scale with Apache Spark SQL and DataFrames — Concepts, Architecture and Examples
- On explaining technical stuff in a non-technical way — (Py)Spark
- How are Big Companies using Apache Spark
- How to export data-frame from Apache Spark
- Apache Spark and Hadoop HDFS: Hello World
- How to Install Scala and Apache Spark on MacOS (now, how does one use py-spark within this?)
- How to use PySpark on your computer
- How to Get Started with PySpark
Finally, you may want to keep pointer to all these articles handy. They may prove to very useful in the weeks ahead.