Recently I am taking the Cloud Computing Specialization MCS course on Coursera for fun and gaining breadth on distributed systems.
One thing I recently learned about is Apache Storm, which is a Distributed Stream Processing framework. At the first glance, I wondered how it is different from the popular Apache Spark; so I did a little bit of research on this, and I found the two comparison charts from here to be quite useful.
So basically, the major difference seems to be brought out by their fundamental architectures: Spark-streaming still uses RDD , meaning it’s also suitable for batch processing; In fact, Spark-streaming itself is not exactly stream processing, but micro-batching processing, it curates data over a short span of time (300ms to 10s etc) and process it just like batches.
(Interesting fact: the Coursera course was developed in 2014 whereas Spark was released in 2015; not a coincidence that such a powerful framework was not mentioned 🙂 )