Working with MongoDB
Last 6 months I spent working in the office on some Facebook app. It is written in Python using Django and uWSGI, but the interesting part is that MongoDB is the primary and only database there. It was not my choice, but anyway it seemed attractive to use some cutting-edge tech like this one so I signed up. This is a collection of small random notes on features, pitfalls and useful techniques of MongoDB.
Optimizing
Since this application was dedicated for Facebook audience the first requirement for my work was to optimize maximally and because database is the root of web-application I've started to look for optimizing techniques. That's what I've found for MongoDB:
-
Always profile. This should always be the first item in every optimizator's list. You should know what to optimize and thus you have to do profiling and by "profiling" I mean "profile everything": database, server-side code, client-side code and etc.
-
Create indices and use hints. That was something I've expected to find first. Index can be created by using
ensureIndex
command. Hints help query optimizer to use right index for particular query. Hints are set viahint
command. -
Limit results. The less you select the faster it works. Results can be limited with
limit
command. -
Select only relevant fields. Same thing, just push map of fields as a second param to
find
command. -
Rewrite some queries in map/reduce. Map/reduce is easy to parallel and if your server has more than one core or CPU (I believe it has) or you have several DB servers rewriting some "heavy" queries will increase their performance, often impressive.
-
Use modifier operators. I'm talking about
$inc
,$set
,$push
and others. They are always faster than retrieve-update-save circle.
Now let's take a closer look at a couple of things in Mongo: indices and map/reduce.
Indices and keys
Indices in Mongo work like the ones in SQL databases. Unique and primary keys can be achieved in Mongo through indices. If you're a pedantic programmer and you always set PKs and UKs for proper fields in your DB you can simply skip this paragraph. For everybody else: always set unique key for unique fields! There is no other way to avoid duplicates because MongoDB has no transactions and therefore it has no transactional consistency, only atomic one. It means that actuality of your data is guaranteed only for current DB command.
For example, imagine you haven't set unique key and coded some badass test in your app instead. This test selects count of documents that have the same data as yours in unique field and if this count is more than 0 then throw error, otherwise create new entry. As I said before actuality is guaranteed only for current command and if your count was 0 in test time it doesn't mean that you have no similar entries in the time of adding new entry. When your app is under some heavy load you'll get a tons of duplicate entries in database.
How really map/reduce works
I assume that you've already heard about map/reduce model. It took two best
practices from functional programming and combined them. MongoDB's modification
of M/R includes the third optional function: finalize
. I'm going to tell how
these things work:
-
All the data you've selected goes through
map
function first andmap
is called on each entry of it. On this step you group data by some key. -
Groups of data from previous step is now passed to
reduce
function which should perform some magic and return reduced value for each group.Here goes some important thing:
reduce
is not called when it's nothing to reduce. I'll better explain it on some example.Imagine you have a collection of cars. There are five entries in it:
{'firm': 'Porsche', 'model': '911 Boxster'}, {'firm': 'Porsche', 'model': 'Carrera GT'}, {'firm': 'BMW', 'model': 'M3'}, {'firm': 'BMW', 'model': 'X6'}, {'firm': 'Audi', 'model': 'Q5'}
Now you want to select count of models for each firm and add 10 to this number. You expect these results:
('Porsche', 12), ('BMW', 12), ('Audi', 11)
You write
map
function:function map () { emit(this.firm, 1); }
It groups your data like this:
('Porsche', 1), ('Porsche', 1), ('BWM', 1), ('BWM', 1), ('Audi', 1)
And some internal Mongo voodoo groups them again:
('Porsche', [1, 1]), ('BMW', [1, 1]), ('Audi', 1)
And pass to
reduce
function:function reduce(key, vals) { var count = 0; vals.forEach(function(e) { count += e; }); return count + 10; }
And you get these:
('Porsche', 12), ('BMW', 12), ('Audi', 1)
Notice the difference?
Now you must be thinking "what's going on?". But look closer to mapped results: Porsche and BMW have array of 1s and Audi has only "1". Because there's only one result in it Mongo thinks that it's already reduced and not performs
reduce
function again. -
"Well, what should I do if I want correct results?" you ask. You should user
finalize
function. That's where your data passed after reducing. After rewritingreduce
andfinalize
you'll get expected results:function reduce(key, vals) { var count = 0; vals.forEach(function(e) { count += e; }); return count; } function finalize(key, val) { return val + 10; }
Conclusion
MongoDB is a great thing for its own purposes. I would recommend it for fast prototyping of apps, logging and caching. It's easy, fast and quite reliable. Try it sometime!