Thursday, July 2, 2009

Benchmarking MongoDB VS. Mysql

EDIT November 2011: Please take this benchmarks with a grain of salt. They are completely non-scientific and when choosing a data store probably raw speed is not all you care about. Try to understand the trade-offs of each database, learn about how your data will be handled in case of losing a server. What happens if there's data corruption? How replication works in your specific database… and so on. After all… don't choose a database over another just because of this blogpost.

One of the projects I work for at the company has a message system with more than 200 million entries in a MySQL database. We've started to think about how can we scale this further. So in my research for alternatives look at nosql databases. In this post I would like to share some benchmarks I ran against Mongodb –a document oriented database– compared to MySQL.

To perform the tests I installed MongoDB as explained here. I also installed PHP, MySQL and Apache2 from Macports. I did no special configurations in any of them.

The hardware used for the tests is my good ol' Macbook White:

Model Name: MacBook
Model Identifier: MacBook2,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 4 MB
Memory: 4 GB
Bus Speed: 667 MHz
Boot ROM Version: MB21.00A5.B07
SMC Version (system): 1.13f3

Because I don't have enough space to store the MongoDB database in my local hardrive, I launched the server with this command:
./bin/mongod --dbpath '/Volumes/alvaro/data/db/'

which tells MongoDB to use my USB hardrive. YES, my USB hardrive :-P

The MySQL server stored the data in the local hardrive.

What was the test?

I loaded in both databases 2 million records from our real data of the message system. Every record has 28 columns, holding informatin about the sender of the message and the recipient, plus the subject, date, etc. For MySQL I used mysqldump. For MongoDB I used the following:

$query = "SELECT * from messsage";
$result = mysql_query($query);
while($row = mysql_fetch_assoc($result))
{
$collection->insert($row);
}

Of course that for the real data loading I added some paginations, I didn't retrieved 2M records at once. And there was some code to initialize the MySQL connection and Mongo to get that $collection object.
The MySQL databases had index on the sid and tid fields (sender id and target id), so I added them to the MongoDB database.
$m = new Mongo();
$collection = $m->selectDB("msg")->selectCollection("msg_old");
$collection->ensureIndex( array("sid", "tid") );

Then I wrote some simple code that will select a limit of 20 records filtered by sid. In the real application this means I'm watching the first 20 messages of my outbox.

EDIT - 2009/07/03

Due to some confusion I have to make something clear. What I'm benchmarking is not the data loading into both databases, nor traversing the data, etc., but the code that you can find here .

This is a similar case of what a user message outbox (or inbox) could be in a production website. The users access his inbox and we retrieve up to 20 messages of his inbox, which are then displayed in an html table. What siege accessed was a script serving that html generated out of the query results.

So the idea is, if MongoDB or MySQL are the backends of this message system, which one will be faster for this specific use case. This benchmark is not about if MongoDB is better than MySQL for every use case out there. We use MySQL a lot in production and we will keep using it as far as I can tell. And yes, I know that MySQL and MongoDB are two totally different technologies that probably only share the word database in their descriptions.

END EDIT - 2009/07/03

I did the code for Mongodb and MySQL. Then my idea was to launch siege and pick some random user ids from a text file and do the stress tests.

Here's an extract from the url textfile:
http://mongo.al/index.php?id=96
http://mongo.al/index.php?id=105
http://mongo.al/index.php?id=108
http://mongo.al/index.php?id=113
http://mongo.al/index.php?id=116
http://mongo.al/index.php?id=117
http://mongo.al/index.php?id=127
http://mongo.al/index.php?id=129
http://mongo.al/index.php?id=130
http://mongo.al/index.php?id=134

This means that siege will pick a random url and hit the server, requesting the outbox of that user id.

Then I increased the ulimit to be able to run this test:
siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

With that command I launch siege, telling it to load the urls to visit form the text file. It will simulate 300 concurrent users and will do 10 repetitions with a random delay between 0 and 1. The last option tells siege to work in internet mode, so it will pick urls randomly from the text file.

When I launched the test wit MongoDB as backend it worked without problems. With the MySQL it crashed quite often. Below I add the results I obtained for both of them.

MongoDB test results:

siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

Transactions: 2994 hits
Availability: 99.80 %
Elapsed time: 11.95 secs
Data transferred: 3.19 MB
Response time: 0.26 secs
Transaction rate: 250.54 trans/sec
Throughput: 0.27 MB/sec
Concurrency: 65.03
Successful transactions: 2994
Failed transactions: 6
Longest transaction: 1.47
Shortest transaction: 0.00

MySQL tets results:
siege -f ./stress_urls_mysql.txt -c 300 -r 10 -d1 -i

Transactions: 2832 hits
Availability: 94.40 %
Elapsed time: 23.53 secs
Data transferred: 2.59 MB
Response time: 0.74 secs
Transaction rate: 120.36 trans/sec
Throughput: 0.11 MB/sec
Concurrency: 89.43
Successful transactions: 2832
Failed transactions: 168
Longest transaction: 16.36
Shortest transaction: 0.00

As we can see, MongoDB performed more than 2X better than MySQL for this specific case. And remember, MongoDB was reading the data from my USB hardrive ;-).

6 comments:

Leon Mergen said...

Just wanting to let you know that the fact that MongoDB uses an USB drive likely had no real effect -- my guess is that everything is cached into memory anyway (you have 2 million records and 4gb ram -- if every record (including indexes) is less than 2kb in size, which is very likely, everything will be read from cache.

Олександр said...

Example of records:

{
"_id" : "User-afb0c0f7-7266-4ab9-9f0b-350ec4067279",
"updated" : "Mon Nov 16 2009 08:56:56 GMT-0800 (PST)",
"roles" : [
{
"name" : "USER",
"permissions" : {
}
}
],
"firstname" : "Gera",
"created" : "Mon Nov 16 2009 08:56:56 GMT-0800 (PST)",
"lastname" : "Af",
"company" : {
"updated" : "Mon Nov 16 2009 08:56:55 GMT-0800 (PST)",
"name" : "Daubamif Inc.",
"created" : "Mon Nov 16 2009 08:56:55 GMT-0800 (PST)"
},
"password" : "5f4dcc3b5aa765d61d8327deb882cf99",
"email" : "gera.af@daubamif.com"
}



Size of database:
> db.users.count()
580841

Running:
> var d = db.users.distinct("company.name");

Takes 30 seconds without index on company. And Around 10 seconds with index.

Hardware: EC2 small instance.

Mr Julien Ballestracci said...

This is EXACTLY what I was looking for, Thank you very much!

adam said...

Hey i am doing something similar. What tools did you use to get all that profiling information? Thanks.

Rodrigo Gregorio said...

I begginer test MongoDB into small VPS 512mb and see this MongoDB not in use consume 150mb of 512mb and MySql not in use 1.5mb , i think this result is from cached memory

Using:

CentOS 5.5
VPS 512mb
PRC 2.4Ghz

Mesut ÇAKIR said...

MangoDb > 3 column > localde 100000 data entry
MsSql > 3 column > localde 100000 data entry > 06 : 51 saniye


Çok hızlı > Very very faster:D