Thursday, July 2, 2009

Benchmarking MongoDB VS. Mysql

One of the projects I work for at the company has a message system with more than 200 million entries in a MySQL database. We've started to think about how can we scale this further. So in my research for alternatives look at nosql databases. In this post I would like to share some benchmarks I ran against Mongodb –a document oriented database– compared to MySQL.

To perform the tests I installed MongoDB as explained here. I also installed PHP, MySQL and Apache2 from Macports. I did no special configurations in any of them.

The hardware used for the tests is my good ol' Macbook White:

Model Name: MacBook
Model Identifier: MacBook2,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 4 MB
Memory: 4 GB
Bus Speed: 667 MHz
Boot ROM Version: MB21.00A5.B07
SMC Version (system): 1.13f3

Because I don't have enough space to store the MongoDB database in my local hardrive, I launched the server with this command:
./bin/mongod --dbpath '/Volumes/alvaro/data/db/'

which tells MongoDB to use my USB hardrive. YES, my USB hardrive :-P

The MySQL server stored the data in the local hardrive.

What was the test?

I loaded in both databases 2 million records from our real data of the message system. Every record has 28 columns, holding informatin about the sender of the message and the recipient, plus the subject, date, etc. For MySQL I used mysqldump. For MongoDB I used the following:

$query = "SELECT * from messsage";
$result = mysql_query($query);
while($row = mysql_fetch_assoc($result))
{
$collection->insert($row);
}

Of course that for the real data loading I added some paginations, I didn't retrieved 2M records at once. And there was some code to initialize the MySQL connection and Mongo to get that $collection object.
The MySQL databases had index on the sid and tid fields (sender id and target id), so I added them to the MongoDB database.
$m = new Mongo();
$collection = $m->selectDB("msg")->selectCollection("msg_old");
$collection->ensureIndex( array("sid", "tid") );

Then I wrote some simple code that will select a limit of 20 records filtered by sid. In the real application this means I'm watching the first 20 messages of my outbox.

EDIT - 2009/07/03

Due to some confusion I have to make something clear. What I'm benchmarking is not the data loading into both databases, nor traversing the data, etc., but the code that you can find here .

This is a similar case of what a user message outbox (or inbox) could be in a production website. The users access his inbox and we retrieve up to 20 messages of his inbox, which are then displayed in an html table. What siege accessed was a script serving that html generated out of the query results.

So the idea is, if MongoDB or MySQL are the backends of this message system, which one will be faster for this specific use case. This benchmark is not about if MongoDB is better than MySQL for every use case out there. We use MySQL a lot in production and we will keep using it as far as I can tell. And yes, I know that MySQL and MongoDB are two totally different technologies that probably only share the word database in their descriptions.

END EDIT - 2009/07/03

I did the code for Mongodb and MySQL. Then my idea was to launch siege and pick some random user ids from a text file and do the stress tests.

Here's an extract from the url textfile:
http://mongo.al/index.php?id=96
http://mongo.al/index.php?id=105
http://mongo.al/index.php?id=108
http://mongo.al/index.php?id=113
http://mongo.al/index.php?id=116
http://mongo.al/index.php?id=117
http://mongo.al/index.php?id=127
http://mongo.al/index.php?id=129
http://mongo.al/index.php?id=130
http://mongo.al/index.php?id=134

This means that siege will pick a random url and hit the server, requesting the outbox of that user id.

Then I increased the ulimit to be able to run this test:
siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

With that command I launch siege, telling it to load the urls to visit form the text file. It will simulate 300 concurrent users and will do 10 repetitions with a random delay between 0 and 1. The last option tells siege to work in internet mode, so it will pick urls randomly from the text file.

When I launched the test wit MongoDB as backend it worked without problems. With the MySQL it crashed quite often. Below I add the results I obtained for both of them.

MongoDB test results:

siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

Transactions: 2994 hits
Availability: 99.80 %
Elapsed time: 11.95 secs
Data transferred: 3.19 MB
Response time: 0.26 secs
Transaction rate: 250.54 trans/sec
Throughput: 0.27 MB/sec
Concurrency: 65.03
Successful transactions: 2994
Failed transactions: 6
Longest transaction: 1.47
Shortest transaction: 0.00

MySQL tets results:
siege -f ./stress_urls_mysql.txt -c 300 -r 10 -d1 -i

Transactions: 2832 hits
Availability: 94.40 %
Elapsed time: 23.53 secs
Data transferred: 2.59 MB
Response time: 0.74 secs
Transaction rate: 120.36 trans/sec
Throughput: 0.11 MB/sec
Concurrency: 89.43
Successful transactions: 2832
Failed transactions: 168
Longest transaction: 16.36
Shortest transaction: 0.00

As we can see, MongoDB performed more than 2X better than MySQL for this specific case. And remember, MongoDB was reading the data from my USB hardrive ;-).

Thursday, June 4, 2009

New Firesymfony Release

I'm pleased to announce the release of version 1.1 of Firesymfony. This time it has a new design that we believe improves the user experience. The design was made by my colleague Jacqueline Wan and the logo by Olaf Horstmann.

Bellow you can see some screenshots and here you have the urls to update the symfony plugins and the Firebug extension:

Symfony Plugin

FB Extension

About Panel:

Configuration and Variables Panel:


Logs Panel:

Cache Panel:

Database Panel:

Timers Panel:

Information Panel:

Monday, March 30, 2009

How I would like to use Propel and Memcache

In this article I will like to share some ideas that are wandering in my mind but I haven't implemented yet. So beware!

I was thinking of a way to reduce database load by caching some results inside Memcache -idea which has nothing special or revolutionary this days-. This article gave me some ideas that I would like to see implemented in some of the projects I work for.

The code is for using mostly inside a Symfony/Propel project, but could be adapted to a different one with ease.

Propel Peer classes come packed with the following method.

BaseUserPeer::retrieveByPK($pk, $con=null);

I would like to override it in the following way:

public static function retrieveByPK($pk, $con = null)
{
  $cacheKey = sprintf('user:id:%d', $pk);
  $asArray = $memcache->get($cacheKey);
  
  if($asArray === null)
  {
    $obj = parent::retrieveByPK($pk, $con);
    if($obj !== null)
    {
      $memcache->set($cacheKey, $obj->toArray(BasePeer::TYPE_FIELDNAME));
    }
  }
  else
  {
    $obj = new User();
    $obj->fromArray($asArray, BasePeer::TYPE_FIELDNAME);
  }
  
  return $obj;
}

Note that I've avoided the memcache connection code. I assume that there is a memcache class that abstracts the process. I'm also avoided the details of this class instantiation.

As you can see from the code, I store the object in memcache as an associative array. I like to do so because I don't want to store a serialized version of the Propel object which will take more space. Also by using a native data type I can warm up the cache from -let's say- a batch script without the need of using Propel at all. Also, if a I have code that don't require symfony or Propel, I can still use the cached data. 

The fromArray and toArray methods are built inside propel objects as a convenient way of populating them, so there's no extra effort in our side to get their benefits. 

As key for the cache I'm using as prefix the name of the table, followed by a colon and then the primary key column name, a colon and the primary value. i. e.: table_name:primary_key:value

Then what is left for this to actually work is to override the User::save($con = null) method, so every time the database row is updated the changes will be reflected in the cache.

I came up with the following code:

 
public function save($con = null)
{
  $affectedRows = parent::save();
  $memcache->set(sprintf('user:id:%d', $this->getId()), $this->toArray(BasePeer::TYPE_FIELDNAME));
  return $affectedRows;
}

There by updating the entry after it's saved to the database I used a traditional pattern with memcache to warm up the cache.

So lets say that in a normal login form, the user will submit his nickname and password to be checked against the database. In this case we have to do a SELECT query
using the nickname and password as WHERE parameters. Instead of issuing a query, I would like to do the following inside UserPeer::retrieveByNickname($nickname).

public static function retrieveByNickname($nickname)
{
  $nicknameCacheKey = sprintf('user:nickname:%s', $nickname);
  $userId = $memcache->get($nicknameCacheKey);
  
  if($userId !== null)
  {
    return UserPeer::retrieveByPK($userId);
  }
  else
  {
    $c = new Criteria();
    $c->add(UserPeer::NICKNAME, $nickname);
    $user = UserPeer::doSelectOne($c);
    if($user !== null)
    {
      $memcache->set(sprintf('user:id:%d', $user->getId()), $user->toArray(BasePeer::TYPE_FIELDNAME));
      $memcache->set($nicknameCacheKey, $user->getId());
    }
    
    return $user;
  }
}

In the last example first I check if there is a memcache entry with the following key: user:nickname:somenickname. The value stored will be the user id which I assume
is the primary key of the table. If the user id is not null then I delegate the call to UserPeer::retrieveByPK to do the job. In the other case I fetch the user from
the database using the nickname as Criteria condition. If the record exists I store inside memcache the user data as an array and also I store the id using the
user:nickname:somenickname key. Now should be clear why in the previous example I used used:id:somenid as key.

Some improvements to do in the retrieveByPK method and in the save method will be to also store the respective values for the user:nickname:somenickname key once
the object has been populated. In this way we increase the chances that retrieveByNickname will successfully hit the memcache.

I hope this article results useful for you and thanks for reading.

NOTE: I know that the code is pretty ugly, some code needs to be refactored out to it's own methods and maybe Propel classes are not the best place for
this caching logic to reside, but I think is a nice example to build upon.

Tuesday, March 3, 2009

Symfony Speed and Hello World Benchmarks

After reading some posts showing that my blah blah framework is way more fast than symfony for a Hello World application I decided to explain why: because symfony is extensible and can adapt to your needs. That’s easy to say you may think, in fact, every framework out there claims that. So what makes symfony so special?


The following list names some of the features provided by symfony.

  • Factories
  • The Filter Chain 
  • The Configuration Cascade
  • The Plugins System
  • Controller adaptability
  • View adaptability


Factories


Since version 1.0 symfony provides a configuration file called factories.yml. This file affects the application core classes configuration. There you can override symfony default classes by your own ones. This means you can set up a custom Front Controller, Web Request, Cache classes, Session storage, etc. 


But this come with an extra price: when a symfony application bootstraps, it reads the configuration file from the filesystem. If the yml file was parsed before, then it loads a PHP file -which can also be cached with APC-, if not, then it parses the yml file, stores the parsed file on the cache folder and loads the configuration from there.


Why should I need that flexibility you may ask? In a project I’m involved with we needed Memcached. This means that we overrode all of the symfony default cache mechanism by our own custom classes. How? Setting up our classes in an easy to read yml file. So for sure when you benchmark your Hello World application symfony will be slower.


Filter Chain


One of the patterns from the Core J2EE Patterns book that impressed me the most is the Intercepting Filter. This patterns teach how to modify a request processing without the need to change Controllers or Model code. The idea is that in a configuration file you plug a class that will take care of pre or post processing the request. This classes are called filters. As an example, you can add a filter that checks if the user has the proper credentials to execute the action she wants. Another filter can cache the response, etc. 


Symfony has the filters.yml file which can be specified by application or by module. This means that we can set filters to be executed for the whole application, and then for specific modules -let’s say for Ajax actions-, we disable them. Does your framework provides this flexibility without resorting to some kind of monkey patching techniques? No? Well symfony does. Say hello to the Filter Chain. So for sure when you benchmark your Hello World application symfony will be slower. Because it adds flexibility to the process. You want to get rid of this behavior? Sure, set up a custom controller in the factories.yml file, and in your new class override the loadFilters method.


Configuration Cascade


As explained in the configuration chapter of the symfony book, symfony allows to modify it’s behavior through some yml files. So for example we have the view.yml that tells symfony which css and javascript files it should load for the current request. We can have an application view.yml configuration and then override the settings per module. When symfony process the request it checks all this files, that’s why in the Hello World benchmark it‘s slow.


Plugins System


Symfony has a very powerful and easy to use Plugin System. It’s more than 400 plugins with a set of 200+ developers speaks by itself of it success. 


The plugins can contain modules of their own and also a config.php file, similar to the project config.php file or the module config.php. When symfony process a request it checks for the settings in those files, this means that they will be read from disk. So in a plugin we can provide a custom logger that is fired up when bootstrapping the application. The Plugin user doesn’t need to care how the logger will be activated, she just now that it will work. 


The same applies for plugin modules. How do you think that symfony knows that certain module/action should be called from a plugin? If you enable the plugin module on the settings.yml file symfony will check inside the plugin module to see if the requested action should be executed there. 


Controller adaptability


For each action that the user call symfony will execute a page controller. Inside our modules symfony let us use a generic myModuleActions class that will extend sfActions or one specific to the action requested by the user, that as an example could be called indexAction and will extend the sfAction class. When a request is processed symfony first checks for the existence of the later. If it doesn’t exists then it tries to load the generic action for that module. Of course you don’t need this kind of flexibility for Hello World apps.


View Adaptability


For rendering the response symfony uses by default the sfPHPView class. If certain module in your application requires a different view, then there are at least three ways to accomplish this as explained here.


Conclusion


Symfony is a Professional Web Application Framework built to cope with real world needs. In a large project with more than a simple salutation feature sooner or later you will need the flexibility provided by the framework. This will save you time and will prevent headaches, because when you have built a whole system with a framework and the business needs start to push in a direction where you have to extend the framework you will thank yourself for having choose symfony at first.

In case that your client requires a Hello World! application, then you can use the following hyper fast framework code: die(“Hello World!”) ;-)

Wednesday, January 28, 2009

Integrating Facebook Hive with Symfony

A symfony feature that has been really helpful while working on projects with it is the debugging capabilities of the framework. Almost every day I see my self going to the command line an doing a tail -f log/frontend_dev.log in my symfony project. This helps me see what is going on behind the scenes, spot bugs, find places where I can improve the code, etc. If something goes wrong there is always a developer on the team that shouts “check the symfony logs”, showing that they had became an essential tool for development. 


But not everything shines under the sun. Sometimes it happens that we would like to have a tool that allows us to filter the logs according to specific criteria. A tool that goes beyond a simple cat logs/frontend_dev.log | grep SELECT. We wanted to perform some analysis on the website usage, basically to help us improve it performance.


A colleague talked about Facebook Scribe, that was not actually related with this dream tool but later lead me to learn about the existence of the Apache Hive project (which was started by Facebook).


The Hive project allows to load a logs file and then filter it with SQL like commands. -Hive is more than this raw description, you can read more about it here.


My idea was to adapt the file logging format of symfony to make easy to import those files inside a Hive database. Because Hive support table partitioning by date, it should be easier to load the data from the logs and then perform the analysis with the SQL like syntax provided by Hive.


After some fights with Ant and Java 1.6 It was possible for me to get Hive running in my Mac. Then I just created a shameless copy of the sfFileLogger and renamed it to sfHiveLogger. I did small changes here and there and got it ready to log in a format that it’s easy to load later into Hive. I browsed through my testing symfony project to generate some logs and then I moved to the command line to start the fun with Hive.


There I created a table to hold the logs with the following command:


CREATE TABLE sflogs(

  logTime STRING, 

  priorityName STRING,

  message STRING

  COMMENT 'This is the sflogs table' 

  PARTITIONED BY(dt STRING) 

  ROW FORMAT DELIMITED

    FIELDS TERMINATED BY '\011' 

    LINES TERMINATED BY '\012';


Everything was working smoothly. The next step was to load the data from the logs file into the sflogs table to start issuing queries to it. I did it with this command:


LOAD DATA LOCAL INPATH '/path/to/myproject/log/frontend_dev.log.2009-01-28' INTO TABLE sflogs PARTITION (dt='2009-01-28');


With the data loaded I started to issue some commands against the Hive console like:


SELECT * FROM sflogs WHERE message LIKE "{sfRequest}%";


SELECT DISTINCT priorityname FROM sflogs;


SELECT COUNT(1) FROM sflogs;


The results were similar as when we work on a mysql client which was awesome. What amazed me the most was how with some easy changes it's possible to adapt symfony to our needs. But which were this changes? Here they are:

First we need to enable the sfHiveLogger in the logging.yml file of your symfony project under the sf_file_debug entry


Then do the shameless copy of the symfony class to your lib folder and rename it to sfHiveLogger.


Then change the code of the initialize method to look like this:


    if (!isset($options['file']))

    {

      throw new sfConfigurationException('File option is mandatory for a file logger');

    }


    $dir = dirname($options['file']);


    if (!is_dir($dir))

    {

      mkdir($dir, 0777, 1);

    }

    

    $logFileName = $options['file'] . '.' . date('Y-m-d');


    $fileExists = file_exists($logFileName);

    if (!is_writable($dir) || ($fileExists && !is_writable($logFileName)))

    {

      throw new sfFileException(sprintf('Unable to open the log file "%s" for writing', $logFileName));

    }


    $this->fp = fopen($logFileName, 'a');

    if (!$fileExists)

    {

      chmod($logFileName, 0666);

    }


Basically the change there is to append the current date in “Y-m-d-” format at the end of the logs file name. This will make easier to import the logs into Hive -only if you want them partitioned by date-.


Then on the log method change the line with:


$line = sprintf("%s %s [%s] %s%s", strftime('%b %d %H:%M:%S'), 'symfony', $priorityName, $message, DIRECTORY_SEPARATOR == '\\' ? "\r\n" : "\n");


to:


$line = sprintf("%s\t%s\t%s%s", strftime('%b %d %H:%M:%S'), $priorityName, $message, "\n");


Here what we do is to apply a little formating there. The most important part is to have the tab and then new line characters as delimiters because this was what we specified in the CREATE TABLE command above.


With this easy steps we can have a Hive enabled log file. If you want to learn more about Hive and the supported commands and the theory behind it please refer to the wiki.


Conclusion:


Besides that I’m still comparing Hive with other solutions to parse and analyze the logs, I think that this tool has a lot of potential to help debugging and profiling symfony applications. If we polish the log format and refine the table structure, then it’s shouldn’t be hard to setup some cronjobs that generate reports of the website usage, improving the usability of the symfony logs.

Monday, January 26, 2009

Custom Views in Symfony 1.0

After seeing this feature request for symfony 1.3 I decided to write a tutorial explaining how to use custom views with symfony 1.0. Yes symfony 1.0 has this feature, a little bit hide inside, but is there since the ol’ good days.

So, there are three ways to have a custom View class for your actions. To show how this work, we will need a project with a module called example

First example:

After you have your module ready add an action called FirstExample which will have this code: 

public function executeFirstExample() {}

Then on the example module add a folder called views. Inside it we will add our custom view class. The name will be firstExampleSuccessView. Create a firstExampleSuccessView.class.php file with the following code inside:

class firstExampleSuccessView extends sfPHPView
{
  
}

As you can see, it extends the sfPHPView provided by symfony,  to get all the functionality inside. Please note that custom views in symfony need to extend sfView, which is the parent class of sfPHPView. 

At this point I’m going to tell you that we can customize a lot of features of the symfony views. For the tutorial I will just show you how to have new shortcut variables inside the templates like the $sf_request or $sf_user already provided by symfony. Feel free to comment about your experience extending the sfView.

To add a new shortcut to the template we need to extend the following method:
sfPHPView::getGlobalVars(). We will add a new shortcut smartly called my_view_data. with a dummy array inside. This will be our code:

protected function getGlobalVars()
{
  $context = $this->getContext();

   $shortcuts = array(
     'sf_context' => $context,
     'sf_params'  => $context->getRequest()->
getParameterHolder(),
     'sf_request' => $context->getRequest(),
     'sf_user'    => $context->getUser(),
     'sf_view'    => $this,
     'my_view_data' => array('foo' => 'this data came from firstExampleSuccessView')
  );

 if (sfConfig::get('sf_use_flash'))
 {
   $sf_flash = new sfParameterHolder();
   $sf_flash->add($context->getUser()->
  getAttributeHolder()->
  getAll('symfony/flash'));
     $shortcuts['sf_flash'] = $sf_flash;
   }

    return $shortcuts;
}

As you can see there is a new my_view_data key with a one element array.

To see this in action we add in the example/templates folder a file called firstExampleSuccess.php with the following content:

<h1>First Example View:</h1>
<?php var_dump($my_view_data); ?>

Clear the cache and point your browser to http://yourapp.com/example/firstExample

There you will see the output of our new shortcut.

As long as our action returns sfView::SUCCESS it will use our new custom view class. Who said that symfony had no custom views? :-)

Remember to follow the naming convention of: <actionName><viewName>View. In our example we had firstExampleSuccessView. firstExample is the action, Success is on of the possible values that our actions can return, View is the default suffix. 

Second Example

In this case we will create a class called secondExampleView inside a file secondExampleView.class.php located in the lib folder of our application. The contents of this class will be almost the same that four our first view.  We will change the class name to secondExampleView and the my_view_data entry will be declared like this:

'my_view_data' => array('foo' => 'this data came from secondExampleView')

Then we create a new action for our module with the following code:

public function executeSecondExample(){}

Then we add the template secondExampleSuccess.php with this code inside:

<h1>Second Example View:</h1>
<?php var_dump($my_view_data); ?>

Now, how do we use our new view with this action? In case we don’t have one, we add a module.yml file in the config folder of the example module. There we add the following content:

all:
  view_class: secondExample

This will tell symfony to use our custom view class for this module. As you can see here,  we can specify different views per environment :-) Yes,  symfony is configurable :-) 

If we point our browsers to the secondExample action we will see the contests of our shortcut.

Third Example:

For this example we will create a class called thirdExampleView inside the lib folder of our application. As we did with our secondExampleView we will have the same code, except for the class name and the my_view_data shortcut that will have inside the following: 

array('foo' => 'this data came from thirdExampleView')

We add a template called thirdExampleSuccess.php with this content:

<h1>First Example View:</h1>
<?php var_dump($my_view_data); ?>

and we create an action like this:

public function executeThirdExample()
{
    $module = $this->getRequestParameter('module');
    $action = $this->getRequestParameter('action');
    $this->getRequest()->setAttribute($module.'_'.$action.'_view_name', 'thirdExample', 'symfony/action/view');
}

What we do here is setting in the 'symfony/action/view' namespace of the sfRequest attribute holder a value with our view class without the View suffix. The key of the attribute should be composed by <module_name>_<action_name>_view_name  as you can see from the code above.

Then we can point our browsers to the thirdExample action to see the results.

With this three ways we can set up your own views and customize the rendering process of our applications. 

As final note I want to add how symfony knows which view class to pick.

  • It checks inside the module/views folder to see if it can find our custom class as explained on the first example.
  • If there is no user class there, it will check if we have set up one in the sfRequest as explained on example three.
  • Then it will try to see if we defined a view in the module.yml file.
  • As last resort it will load the default sfPHPView class.

Thanks for reading this far and I hope this will help you on your projects.

Thursday, January 15, 2009

Firebug Framework: CSS helper functions

In a previous post I started talking about the functionality that is already implemented in Firebug. While I was doing some reworking on FireSymfony I had the chance to use some of the CSS methods that are part of the Firebug lib.js source file:

hasClass = function(node, name)

This function expects a node, that for example can be obtained with Firebug "$()" function, and the CSS class name that we are searching for. Returns boolean as expected.

setClass = function(node, name)

Adds the CSS class provided by name to the node

removeClass = function(node, name)

Removes the class specified by name from node

toggleClass = function(elt, name)

If elt has class name then it will remove it from the element, otherwise it will add the class to the element.

setClassTimed = function(elt, name, context, timeout)

This will add the class name to the element. When the milliseconds timeout has expired, the class will be removed from the element and the timeout will be cleared. The timeout is added to the provided Firebug context

cancelClassTimed = function(elt, name, context) 

This method will remove a class added with the previous method and will clear the timeout.

All this methods can be accessed from inside our Firebug extensions if we declared them as explained here by Jan Odvarko. Another way is calling them like this: FBL.methodName(...)

There are more utilities inside the Firebug libraries that can help us while we develop our custom extensions, so we don't have to reinvent the wheel. I've plans of keep documenting the Firebug code, so stay tuned.