Big data is going Spark crazy! Here’s a whopping 6 hour intensive, fast-paced and vendor agnostic look at Spark Core presented by Sameer Farooqui, a client services engineer at Databricks. The talk comes from Spark Summit East. The presentation includes the following topics:
- History of Spark
- RDD fundamentals
- Spark runtime architecture integration with resource managers (standalone, YARN)
- GUIs
- Memory and persistence
- Jobs, stages, tasks
- Broadcast variables and accumulators
- PySpark
- DevOps 102
- Shuffle
- Spark streaming