Category

learning

Recent learning on (Computer) Networking from practice

Today I just finished implementing a heterogenous network simulation environment using Mininet for my most recent work on implementing a traffic migration framework. Although I have used Mininet before, this past few weeks have really been a roller coaster and I have learnt so much more than what I had before.

My latest Mininet environment consists of a simulated geo-distributed networks with several Network Functions, OSPF router, BGP router, load balancer (L2 and L3/4), and programmable switches (powered by OpenFlow) in general. They are implemented to show the genericness of our framework.

Besides the usual Mininet APIs, I also got my hands dirtyby actually tuning industry routing softwares, specifically with the Quagga software package. I learnt to explore and configure OSPF and BGP rules from scratch, as well as configuring public and private IP namespaces. For the programmable switches, I have deep-dived into the OpenFlow protocol and implemented a Ryu app that can explore the topology, do path discovery and actively install OpenFlow entries to switches for heterogenous packets flows (IPv4 and ARP mostly).

After all, I think the biggest impact to me was to become more truly understand what Network Functions are. They are basically Software Defined modules that sit either in the control plane and even data plane (Now P4 is talking!) and manipulate switch tables using the intelligence of a general/specific purpose computing environment (e.g. Linux).

Though it was a lot of pain figuring out how these so many new things work (can’t forget that I spend half a day figuring where the hell was the command “show ip ospf” is …), I can finally say that it was worth it! No pain no gain!

Preliminary thoughts on Big Data System Papers

Recently, I have been reading some famous big data system papers that have been put in industry use for quite some time, here is an (incomplete) list of them:

Spark RDD: https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf
Delayed Scheduling: http://elmeleegy.com/khaled/papers/delay_scheduling.pdf
Spark SQL: https://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf
Spark Streaming: https://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf
Mesos: https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf
Yarn: https://www.cse.ust.hk/~weiwa/teaching/Fall15-COMP6611B/reading_list/YARN.pdf
ZooKeeper: https://www.usenix.org/legacy/event/atc10/tech/full_papers/Hunt.pdf

Upon reading them, I have raised my thoughts on what exactly constitutes influence for pure system papers, specifically, how are they different from work culminated from being a Software Engineer?

First of all, it is important to note that some of them are published by corporates such as Yahoo and Databricks, and for those that are not, a significant chunk of them are related to evaluating the performance of big data systems in real-world industries such as Facebook and Yahoo, and many of them are published after years of industry deployment perhaps for proof of improvement.

So how are they different from Software Engineering? Judging from the pure contents of the papers, I don’t think there are much; However, research is about proposing ideas, not engineering itself. Yet, those ideas are mostly inspired, and more importantly, proven useful by industry applications, which consequently are necessitated by a strong collaborating between industry and academics. Therefore, if I were to pursue a PhD or research oriented role in these areas, there needs to be a great deal of collaborating effort between my school and the industry, which unfortunately is usually only dominated by the top ones 🙁

I will shift my focus to network papers soon, but I will continue populating the list above once I read more.

Cassandra Internals

Consistency level:

Tuneable with trade off from availability in the CAP theorem. Provides Eventual Consistency.
The highest consistency level does not guarantee perfect consistency due the lack of rolling back or WAL (write ahead logs) mechanism found in tradition RDBMS.

For more, read here.

Partition/Hashing mechanism:

Similar to the Chord protocol.

For more, read here

Membership Protocol (Failure detection):

Similar to the Gossip membership protocol.

For more, read here.

Request ordering:

Does not use Causality-based method such as Lamport/Vector timestamps.
Instead uses a “Last one wins” competition strategy and use clock synchronisation.

Fore more, read here.

More concepts to be added….

Apache Spark(-streaming) vs Apache Storm

Recently I am taking the Cloud Computing Specialization MCS course on Coursera for fun and gaining breadth on distributed systems.

One thing I recently learned about is Apache Storm, which is a Distributed Stream Processing framework. At the first glance, I wondered how it is different from the popular Apache Spark; so I did a little bit of research on this, and I found the two comparison charts from here to be quite useful.

Comparison in different aspects
Choice of framework in specific scenarios

So basically, the major difference seems to be brought out by their fundamental architectures: Spark-streaming still uses RDD , meaning it’s also suitable for batch processing; In fact, Spark-streaming itself is not exactly stream processing, but micro-batching processing, it curates data over a short span of time (300ms to 10s etc) and process it just like batches.

(Interesting fact: the Coursera course was developed in 2014 whereas Spark was released in 2015; not a coincidence that such a powerful framework was not mentioned 🙂 )