jump to navigation

Installing Phusion’s Passenger on Leopard (OS X 10.5) September 24, 2009

Posted by pjin in Uncategorized.
Tags: ,
add a comment




Scribe design September 16, 2009

Posted by pjin in Scalability.
add a comment

a good article covering Scribe’s design


Here at Facebook, we’re constantly facing scaling challanges because of our enormous growth. One particular problem we encountered a couple of years ago was collection of data from our servers. We were collecting a few billion messages a day (which seemed like a lot at the time) for everything from access logs to performance statistics to actions that went to News Feed. We used a variety of different technologies for the different use cases, and all of them were bursting at the seams. We decided to build a unified system (called Scribe) to handle all of these cases, and do it in a way that would scale with Facebook’s growth. The system we built turned out to be enormously useful, handling over 100 use cases and tens of billions of messages a day. It has also been battle tested by just about anything that can go wrong, so I encourage you to take a look at the newly opened Scribe source and see if it might be useful for you. To give the code some context, I’m going to go through the major design decisions we made to allow the system to scale.

The first decision we made was to not lock ourselves into a particular network topology. The Scribe servers are arranged in a directed graph, but each server only knows about the next server in the graph. This flexible topology allows for things like adding an extra layer of fan-in if the system grows too large, and batching messages before sending them between datacenters, but without having any code that explicitly needs to understand datacenter topology, only a simple configuration.

The second major design decision was about reliability. We chose was a middle ground here, reliable enough that we can expect to get all of the data almost all of the time, but not reliable enough to require heavyweight protocols and disk usage. More specifically, Scribe spools data to disk on any node to handle intermittent connectivity node failure, but it doesn’t sync a log file for every message, so there’s a possibility of a small amount of data loss in the event of a crash or catastrophic hardware failure. Basically, this is more reliability than you get with most logging systems, but not something you should use for database transactions. As it turned out, this is a reasonable level of reliability for a lot of use cases, and has made scaling much easier. It’s also the source of a lot of the hard-learned lessons: getting the system to catch up seamlessly after a significant network problem is tricky, especially when there are tens or hundreds of gigabytes of data backed up.

The final design decision was about the data model. When you’re building something that looks like a logging system there are a lot of things people expect: logging levels and rules about when they get sent, timestamping and ordering of messages, schemas for common messages, etc. We decided that this was a can of worms that shouldn’t be mixed up with the asynchronous and mostly reliable delivery of data, so we made the data model very simple. A message is two strings: a category and the actual message. The category is the description of what the message is about, and the expectation is that messages of the same category end up in the same place. The message is the actual data to be logged. We also don’t have any a priori list of categories that must be maintained. If you create a new category it shows up at a new file. This is following the Unix philosophy of doing exactly one thing and doing it well, and it has definitely paid off in ease of use and development. We started with four or five use cases in mind and now we have hundreds, but we didn’t have to modify the Scribe source for any of them.

Another choice we made early on was to build Scribe using Thrift. This sped up development enormously because a lot of the hard parts were already taken care of, and it also made the resulting system much more flexible. We currently log messages to Scribe from PHP, python, C++, and Java code, and the list of possible languages is growing all the time from the contributions of developers around the world. So Scribe has already benefitted enormously from Thrift being open, and it will be even better having Scribe open too. I hope you find it as useful as we do.

Building and running Scribe September 16, 2009

Posted by pjin in Uncategorized.
add a comment

Facebook Scribe Server Documentation And Tutorials

if met error while building thrift, refer to following post for solution:


Unit Test, JUnit, MockObject … September 2, 2009

Posted by pjin in Uncategorized.
add a comment



iBATIS August 31, 2009

Posted by pjin in Uncategorized.
add a comment

a good open source Object-Relation Mapping layer


install ruby mysql driver on Mac OS X August 12, 2009

Posted by pjin in Uncategorized.
add a comment

sudo env ARCHFLAGS=”-arch i386″ gem install mysql —   –with-mysql-dir=/usr/local/mysql –with-mysql-lib=/usr/local/mysql/lib   –with-mysql-include=/usr/local/mysql/include

Graph algorithm with MapReduce July 31, 2009

Posted by pjin in Uncategorized.
add a comment

some useful links for this topic:

1. Shortest path and Breadth First Search


2. Transitive closure




HBase Architecture July 28, 2009

Posted by pjin in Uncategorized.
add a comment


Data Model

HBase uses a data model very similar to that of Bigtable. Applications store data rows in labeled tables. A data row has a sortable row key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have widely varying numbers of columns.

A column name has the form “<family>:<label>” where <family> and <label> can be arbitrary byte arrays. A table enforces its set of <family>s (called “column families”). Adjusting the set of families is done by performing administrative operations on the table. However, new <label>s can be used in any write operation without pre-announcing it. HBase stores column families physically close on disk, so the items in a given column family should have roughly the same read/write characteristics and contain similar data.

Only a single row at a time may be locked by default. Row writes are always atomic, but it is also possible to lock a single row and perform both read and write operations on that row atomically.

An extension was added recently to allow multi-row locking, but this is not the default behavior and must be explicitly enabled.

Conceptual View

Conceptually a table may be thought of a collection of rows that are located by a row key (and optional timestamp) and where any column may not have a value for a particular row key (sparse). The following example is a slightly modified form of the one on page 2 of the [WWW] Bigtable Paper (adds a new column family “mime:”).

Row Key Time Stamp Column “contents:” Column “anchor:” Column “mime:”
“com.cnn.www” t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6 “<html>…” “text/html”
t5 “<html>…”
t3 “<html>…”

Physical Storage View

Although at a conceptual level, tables may be viewed as a sparse set of rows, physically they are stored on a per-column family basis. This is an important consideration for schema and application designers to keep in mind.

Pictorially, the table shown in the conceptual view above would be stored as follows:

Row Key Time Stamp Column “contents:”
“com.cnn.www” t6 “<html>…”
t5 “<html>…”
t3 “<html>…”
Row Key Time Stamp Column “anchor:”
“com.cnn.www” t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
Row Key Time Stamp Column “mime:”
“com.cnn.www” t6 “text/html”

It is important to note in the diagram above that the empty cells shown in the conceptual view are not stored since they need not be in a column-oriented storage format. Thus a request for the value of the “contents:” column at time stamp t8 would return no value. Similarly, a request for an “anchor:my.look.ca” value at time stamp t9 would return no value.

Hadoop resources June 11, 2009

Posted by pjin in Hadoop.
1 comment so far


Yahoo hadoop turorial:

Core Wiki:

Google Scability Conference Videos April 16, 2009

Posted by pjin in Scalability.
add a comment