Missional Code alert('Him');


Facebook Multifeed

I've been spending some time recently looking at how Facebook has scaled to the 1 billion users they have now. Facebook's pretty famous already for a bunch of internal technologies they're responsible for - Haystack (photos), HipHop, Cassandra, Hive, Torando, Thrift, Scribe. Most of this stuff can be summarized from the later mentioned resource 25hoursaday.com, but I thought I'd give a write up for myself anyways.

Facebook's News feed is hard to scale because instead of loading mostly static data for one user, you need to load constantly changing data for one users concerning up to 5000 other users. One way around this is to heavily cache everything, which Facebook apparently does - they cache the last 50 actions a user does. When you go to the home page, Facebook looks up all of your friends and finds their recent actions in their Memcached caches. Multifeed itself is the application that determines what should be rendered to the home page based on relevance and how recent the action took place. That ultimately gets cut down to 'x' items that get rendered to the page and more reload as you traverse the page.

Facebook is able to achieve such low latency, in part, because much data is cached. In terms of a new page, you really only want the freshest data anyways, and that will be readily available in the cache. This is what makes Timeline a different challenge, because it requires fast retrieval and ranking for stale data. But anyways, a lot of tuning in Memcache makes this happen - Facebook made a few changes, switching to UDP and adding multithreading, and that's apparently resulted in five times higher throughput.

There was a section mentioned about Facebook sticking with MySQL and using it primarily as a key-value store and not for relational purposes (even despite developing Cassandra). They said they stuck with it because it has been proven, its administration tools, and ease of data replication. The fact that they used the database as a key-value store enabled developers to be able to store data they wanted without necessarily having to modify table schemas and seems similar to this article about how Reddit is built on top of two tables: http://kev.inburke.com/kevin/reddits-database-has-two-tables/





Comments (0) Trackbacks (0)

No comments yet.

Leave a comment


No trackbacks yet.