Apache Spark(-streaming) vs Apache Storm

Recently I am taking the Cloud Computing Specialization MCS course on Coursera for fun and gaining breadth on distributed systems.

One thing I recently learned about is Apache Storm, which is a Distributed Stream Processing framework. At the first glance, I wondered how it is different from the popular Apache Spark; so I did a little bit of research on this, and I found the two comparison charts from here to be quite useful.

Comparison in different aspects
Choice of framework in specific scenarios

So basically, the major difference seems to be brought out by their fundamental architectures: Spark-streaming still uses RDD , meaning it’s also suitable for batch processing; In fact, Spark-streaming itself is not exactly stream processing, but micro-batching processing, it curates data over a short span of time (300ms to 10s etc) and process it just like batches.

(Interesting fact: the Coursera course was developed in 2014 whereas Spark was released in 2015; not a coincidence that such a powerful framework was not mentioned 🙂 )

Leave a comment

Your email address will not be published. Required fields are marked *