Tag Archives for " PySpark "

Tutorial: An Introduction to Apache Spark

Overview After lots of ground-breaking work led by the UC Berkeley AMP Lab, Apache Spark was developed to utilize distributed, in-memory data structures to improve data processing speeds over Hadoop for most workloads. In this post, we’re going to cover the architecture of Spark and basic transformations and actions using a real dataset. If you […]

Share On Facebook
Share On Twitter
Share On Linkedin
Share On Reddit