Saturday, August 25, 2012

MongoDB, Mongoid, MapReduce and Embedded Documents.

I am using Mongoid to store some data as documents in a MongoDB database and then run some MapReduce queries against the data. Now I have no trouble with mapping data from normal documents and an embedded document but I could not extract data from an embedded collection of documents i.e.

class Foo
  include Mongoid::Document

  #fields
  field :custom_id, :type => String

  #relations
  embeds_many :bars

end
class Bar
  include Mongoid::Document

  #fields
  field :custom_field, :type => String

  #relations
  embedded_in :Foo

end

First it looks like that we need to run the map part of the MapReduce against the parent document and not the child i.e. Foo.map_reduce(...) will work find documents but Bar.map_reduce(...) does not, however that is not surprising as it is also not possible to count all Bar documents by doing Bar.all.count in the rails console.

Now a MapReduce query in MongoDB is done as a pair of JavaScript scripts, the first does the map by emitting a mini-document of data and the second that aggregates the data in some manner. So thinking I had a collection (array) my first attempt to map data from the embedded document was this:

MAP:
function() {
  if (this.bars == null) return;
  for (var bar in this.bars){
    emit(bar.custom_field, { count: 1 });
  }
}

REDUCE:
function(key, values) {
  var total = 0;
  for ( var i=0; i< values.length; i++ ) {
    total += values[i].count;
  }
  return { count: total };
}

This produced an unusual result such that there was only a single aggregated document with a null key and the count was the total number of child documents (summed across all the parents).

Now I could have just broken the child document out and not embedded it but I didn't want to break the model over something so trivial that must, in my eyes, be possible.

After much googling and reading of forum posts, I couldn't find any samples. I eventually observed of some 'unusual' syntax on an unrelated topic which led me to rewrite the map script into this:

function() {
  if (this.bars== null) return;
  for (var bar in this.bars){
    emit(this.bars[bar].custom_field, { count: 1 });
  }
}

Which produced the expected results. Okay this was probably obvious to anyone who knows MongoDB+MapReduce well but it took me a while to find out and it still isn't that intuitive, though I think I now know why it is this way, so I thought I'd write it up as a bit of a reference.