Distributed File Storage with MogileFS

Where did it come from?

The folks over at Danga Interactive (creators of LiveJournal) have developed an open source distributed filesystem called MogileFS that has become quite popular in the Rails community. Danga is also responsible for the well-known memcached memory caching system.

MogileFS has been around since mid-2004 (publicly) and was developed by Danga for use in LiveJournal. The Rails community became aware of MogileFS when The Robot Co-op, creators of the 43-* series of sites, began to use it for image storage and released a Ruby client for it. Eric Hodel has written some helpful articles about his experiences with MogileFS and Ruby on his blog.

My experience with MogileFS has been largely positive. Like any good tool, however, MogileFS works best when used for its intended purpose. MogileFS was designed to be an application-level utility, leaving the responsibility of file management ultimately up to the application. I tried to use MogileFS as more of a distributed mirror system, serving files directly off of the MogileFS storage nodes. That didn't work so well, I would suggest finding another tool if you're trying to do something like that.

What does it do?

Let me re-state the purpose of MogileFS: an application-level distributed filesystem. That means that your application should be the only entity communicating with the storage nodes, so any files going in to or coming out of the filesystem go through your application first. A classic example is image storage. You have a web app that stores a lot of images (profile pictures, for instance), and you clearly don't want to store the images on disk on the web server. Local storage won't scale at all. So what to do? Put the images in a nebulous, pseudo-infinite array of storage. To your application, MogileFS behaves kind of like a hash map: put a file in with a unique key, and you can fetch it later with the same key. Files can be accessed from anywhere, so scaling is no longer a concern. Storage nodes can be added on the fly, and files are automatically replicated, so storage space and backup are no longer concerns either.

Here's a quick overview of the MogileFS architecture. The storage nodes are where the files are actually stored (obviously). Your application communicates with the trackers, who have well-known hostnames. The trackers use a DB server (could be on the same machine) to store metadata. Other key vocabularly words to know are domains, which distinguish files for different applications, and classes, which distinguish file types.

Do it.

So if you're searching for a simple, scalable file storage solution, give MogileFS a try. It's fairly easy to set up, and it works! MogileFS has seen extensive, large-scale production use, so you don't need to worry about it being unproven. The Ruby client is very easy to use, so integrating it with a Rails app is a breeze.