Obvious Hints: Benchmarking MongoDB VS. Mysql

EDIT November 2011: Please take this benchmarks with a grain of salt. They are completely non-scientific and when choosing a data store probably raw speed is not all you care about. Try to understand the trade-offs of each database, learn about how your data will be handled in case of losing a server. What happens if there's data corruption? How replication works in your specific database… and so on. After all… don't choose a database over another just because of this blogpost.

One of the projects I work for at the company has a message system with more than 200 million entries in a MySQL database. We've started to think about how can we scale this further. So in my research for alternatives look at nosql databases. In this post I would like to share some benchmarks I ran against Mongodb –a document oriented database– compared to MySQL.

To perform the tests I installed MongoDB as explained here. I also installed PHP, MySQL and Apache2 from Macports. I did no special configurations in any of them.

The hardware used for the tests is my good ol' Macbook White:

Model Name: MacBook

Model Identifier: MacBook2,1

Processor Name: Intel Core 2 Duo

Processor Speed: 2 GHz

Number Of Processors: 1

Total Number Of Cores: 2

L2 Cache: 4 MB

Memory: 4 GB

Bus Speed: 667 MHz

Boot ROM Version: MB21.00A5.B07

SMC Version (system): 1.13f3

Because I don't have enough space to store the MongoDB database in my local hardrive, I launched the server with this command:

./bin/mongod --dbpath '/Volumes/alvaro/data/db/'

which tells MongoDB to use my USB hardrive. YES, my USB hardrive :-P

The MySQL server stored the data in the local hardrive.

What was the test?

I loaded in both databases 2 million records from our real data of the message system. Every record has 28 columns, holding informatin about the sender of the message and the recipient, plus the subject, date, etc. For MySQL I used mysqldump. For MongoDB I used the following:

$query = "SELECT * from messsage";

$result = mysql_query($query);

while($row = mysql_fetch_assoc($result))

{

$collection->insert($row);

}

Of course that for the real data loading I added some paginations, I didn't retrieved 2M records at once. And there was some code to initialize the MySQL connection and Mongo to get that $collection object.

The MySQL databases had index on the sid and tid fields (sender id and target id), so I added them to the MongoDB database.

$m = new Mongo();

$collection = $m->selectDB("msg")->selectCollection("msg_old");

$collection->ensureIndex( array("sid", "tid") );

Then I wrote some simple code that will select a limit of 20 records filtered by sid. In the real application this means I'm watching the first 20 messages of my outbox.

EDIT - 2009/07/03

Due to some confusion I have to make something clear. What I'm benchmarking is not the data loading into both databases, nor traversing the data, etc., but the code that you can find here .

This is a similar case of what a user message outbox (or inbox) could be in a production website. The users access his inbox and we retrieve up to 20 messages of his inbox, which are then displayed in an html table. What siege accessed was a script serving that html generated out of the query results.

So the idea is, if MongoDB or MySQL are the backends of this message system, which one will be faster for this specific use case. This benchmark is not about if MongoDB is better than MySQL for every use case out there. We use MySQL a lot in production and we will keep using it as far as I can tell. And yes, I know that MySQL and MongoDB are two totally different technologies that probably only share the word database in their descriptions.

END EDIT - 2009/07/03

I did the code for Mongodb and MySQL. Then my idea was to launch siege and pick some random user ids from a text file and do the stress tests.

Here's an extract from the url textfile:

http://mongo.al/index.php?id=96

http://mongo.al/index.php?id=105

http://mongo.al/index.php?id=108

http://mongo.al/index.php?id=113

http://mongo.al/index.php?id=116

http://mongo.al/index.php?id=117

http://mongo.al/index.php?id=127

http://mongo.al/index.php?id=129

http://mongo.al/index.php?id=130

http://mongo.al/index.php?id=134

This means that siege will pick a random url and hit the server, requesting the outbox of that user id.

Then I increased the ulimit to be able to run this test:

siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

With that command I launch siege, telling it to load the urls to visit form the text file. It will simulate 300 concurrent users and will do 10 repetitions with a random delay between 0 and 1. The last option tells siege to work in internet mode, so it will pick urls randomly from the text file.

When I launched the test wit MongoDB as backend it worked without problems. With the MySQL it crashed quite often. Below I add the results I obtained for both of them.

MongoDB test results:

siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

Transactions: 2994 hits

Availability: 99.80 %

Elapsed time: 11.95 secs

Data transferred: 3.19 MB

Response time: 0.26 secs

Transaction rate: 250.54 trans/sec

Throughput: 0.27 MB/sec

Concurrency: 65.03

Successful transactions: 2994

Failed transactions: 6

Longest transaction: 1.47

Shortest transaction: 0.00

MySQL tets results:

siege -f ./stress_urls_mysql.txt -c 300 -r 10 -d1 -i

Transactions: 2832 hits

Availability: 94.40 %

Elapsed time: 23.53 secs

Data transferred: 2.59 MB

Response time: 0.74 secs

Transaction rate: 120.36 trans/sec

Throughput: 0.11 MB/sec

Concurrency: 89.43

Successful transactions: 2832

Failed transactions: 168

Longest transaction: 16.36

Shortest transaction: 0.00

As we can see, MongoDB performed more than 2X better than MySQL for this specific case. And remember, MongoDB was reading the data from my USB hardrive ;-).

6 comments:

Leon Mergen said...: Just wanting to let you know that the fact that MongoDB uses an USB drive likely had no real effect -- my guess is that everything is cached into memory anyway (you have 2 million records and 4gb ram -- if every record (including indexes) is less than 2kb in size, which is very likely, everything will be read from cache.; July 3, 2009 at 11:17 AM
Олександр said...: Example of records:

{
"_id" : "User-afb0c0f7-7266-4ab9-9f0b-350ec4067279",
"updated" : "Mon Nov 16 2009 08:56:56 GMT-0800 (PST)",
"roles" : [
{
"name" : "USER",
"permissions" : {
}
}
],
"firstname" : "Gera",
"created" : "Mon Nov 16 2009 08:56:56 GMT-0800 (PST)",
"lastname" : "Af",
"company" : {
"updated" : "Mon Nov 16 2009 08:56:55 GMT-0800 (PST)",
"name" : "Daubamif Inc.",
"created" : "Mon Nov 16 2009 08:56:55 GMT-0800 (PST)"
},
"password" : "5f4dcc3b5aa765d61d8327deb882cf99",
"email" : "gera.af@daubamif.com"
}

Size of database:
> db.users.count()
580841

Running:
> var d = db.users.distinct("company.name");

Takes 30 seconds without index on company. And Around 10 seconds with index.

Hardware: EC2 small instance.; November 16, 2009 at 3:05 PM
Anonymous said...: This is EXACTLY what I was looking for, Thank you very much!; August 4, 2010 at 2:29 PM
adam said...: Hey i am doing something similar. What tools did you use to get all that profiling information? Thanks.; August 5, 2010 at 2:04 PM
Rodrigo Gregorio said...: I begginer test MongoDB into small VPS 512mb and see this MongoDB not in use consume 150mb of 512mb and MySql not in use 1.5mb , i think this result is from cached memory

Using:

CentOS 5.5
VPS 512mb
PRC 2.4Ghz; November 12, 2010 at 4:37 PM
Mesut ÇAKIR said...: MangoDb > 3 column > localde 100000 data entry
MsSql > 3 column > localde 100000 data entry > 06 : 51 saniye

Çok hızlı > Very very faster:D; February 25, 2011 at 5:18 PM

Obvious Hints

Thursday, July 2, 2009

Benchmarking MongoDB VS. Mysql

6 comments:

About Me

My Twitter

My Twitter

Labels

What I read

Blog Archive

Obvious Hints

Thursday, July 2, 2009

Benchmarking MongoDB VS. Mysql

6 comments:

About Me

My Twitter

My Twitter

Labels

What I read

Subscribe To

Blog Archive