Advanced Apache Spark

Big data is going Spark crazy! Here’s a whopping 6 hour intensive, fast-paced and vendor agnostic look at Spark Core presented by Sameer Farooqui, a client services engineer at Databricks. The talk comes from Spark Summit East. The presentation includes the following topics:

  • History of Spark
  • RDD fundamentals
  • Spark runtime architecture integration with resource managers (standalone, YARN)
  • GUIs
  • Memory and persistence
  • Jobs, stages, tasks
  • Broadcast variables and accumulators
  • PySpark
  • DevOps 102
  • Shuffle
  • Spark streaming

 

 

Download insideAI News: An Insider’s Guide to Apache Spark