Category

learning

Apache Spark vs Apache Storm

Recently I am taking the Cloud Computing Specialization MCS course on Coursera for fun and gaining breadth on distributed systems.

One thing I recently learned about is Apache Storm, which is a Distributed Stream Processing framework. At the first glance, I wondered how it is different from the popular Apache Spark; so I did a little bit of research on this, and I found the two comparison charts from here to be quite useful.

Comparison in different aspects
Choice of framework in specific scenarios

So basically, the major difference seems to be brought out by their fundamental architectures: Spark uses HDFS, meaning it’s also possible for batch processing.

(Interesting fact: the Coursera course was developed in 2014 whereas Spark was released in 2015; not a coincidence that such a powerful framework was not mentioned 🙂 )