dmathieu - How and why we moved from couchdb to mongodb

How and why we moved from couchdb to mongodb

Monday, 30 August 2010 in Development Articles by Damien Mathieu Creative Commons License

After quite some hesitations and discussions, we’ve decided to migrate our application from CouchDB to MongoDB.

I’ll try, in that article, to explain why and how we’ve decided to do so.

The why

filters The image at the left is the search interface for the application on which I’m working.

In that application, we have several records (each one of them representing a borehole). Each record has several attributes such as those displayed here (name; length and date).

The higher part of the image is the filters. We can add as many filters as we want on the attributes. Here, for example, we have two filters : one on the serial of the machine which did the borehole. And an other one on a date interval.

The displayed records are the one which correspond to all the provided filters.

How it works with CouchDB

In order to search for the appropriate record wich CouchDB, we dynamicaly create several views.

For example, the search above is calling the view by_boreholename_creation_date, which returns all the records wich, as key, an array of all the provided attributes. When calling that view, we search with startkey and endkey.

However the way startkey and endkey works is a bit special. It does an equality comparison for all the array elements, except the last one.

So let’s guess the following search :

  • An interval of date.
  • The borehole name.

The second elemnt will be appropriately searched. However for the interval, the search will be a ==. Consequently, we won’t have the results we want. We will only have the results for which the date is exactly the one we’ve entered.

At the WebWorkersCamp, when I asked for this, someone suggested I should do one request for every filter.

Knowing that some of our clients have more than 700 records and it’s growing exponentially, that wasn’t possible.

Let’s see MongoDB

So we’ve decided to migrate to MongoDB. I started to work on that migration the 26th or july and it’s been deployed in production this week (30th of august).

MongoDB queries are way easier. Mostly because we don’t need to care about javascript views. They’re generated automatically by the engine. The operators allows us to manage all of our needs.

We’re using mongoid (with rails 2.3). And a simple call to the where method with the appropriate arguments hash is enough to make our search.

We don’t have the same problems with the filters order. Our searches are always done on an interval when there’s one or on a regular expression when it’s a string.

The how

With CouchDB, we were using CouchRest. With MongoDB, we’re using Mongoid.

The “how” problem was the data migration. I didn’t wish to do too low level.

So I’ve moved the CouchDB models to the lib/ folder (so they don’t get automatically loaded when launching the application in production) and I gave them the following content :

module Couchdb
    class Record < CouchRest::ExtendedDocument
        def self.to_s

So my model gets the datas from the CouchRest namespace “Record” and not “Couchdb::Record”

Than in a rake task, I loop through all the CouchDB records and add them to MongoDB.

Couchdb::Record.all.each do |record|
    record.delete '_rev'
    record.delete '_attachments'
    record.delete 'couchrest-type'
    r = record
    puts "Record saved"
  puts "There are now #{Record.count} records in the mongo db"

As CouchRest’s models extends from Hashes (which is a quite weird decision), it’s much easier to get all the attributes and just remove the internal ones.

You’ll note that we don’t remove the _id from CouchRest. Like that, our documents keep the same id.

That rake task is manually executed when we’ve been deploying in production.


Don’t give up on CouchDB because of that article ! This engine is really great, even though it didn’t fit with our specific search needs. I wouldn’t hesitate to recommand it.

Moreover a good news is that, as MongoDB stores it’s datas in binary json, accessing them is much faster. We don’t have real production statistics yet. But the execution time of our tests has reduced of 20% !