Apache Spark for ZCW Data

Apache Spark is a fast and general-purpose cluster computing system.

It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

As part of your weekend agenda, read these articles, as many as you can. You might even decide to follow one through to the end. But that's up to you.

Spark is perhaps the most important single piece of tech you'll use for the next few years. Understanding its applicability, architecture, and use will differentiate you from many Data people who are not strong in it.

Scan over all the articles quickly (2-3 min per article) first. Jot down notes on what seems to be in each article. Then, looking over your notes on eacha rticle, read in the order that seems most natural to you. By repetition, reading the same terms used by different authors, helps you to catalog in your own mind your understanding of how the network of concepts comes together to form your own expertise.

Finally, you may want to keep pointer to all these articles handy. They may prove to very useful in the weeks ahead.
baby-yoda-spark-1