Monday, July 6, 2009

My Guess on Symfony 2

After I read this tweet from Fabien I was left thinking on how Symfony 2 will be. Then I
remembered this presentation from Fabien where he talks about the new framework. I took a look at it and then I decide to glue the pieces together to get something working out of the
code given in slide 28.

So to start with it, I needed an application to build. Since some days ago I'm playing with MongoDB, a document oriented database. To learn how to use it I built a centralized logger for symfony applications. The idea is that if we have in production twenty machines serving symfony and then we need to parse the logs to find possible errors, etc., it will be nice to have a tool that centralizes the logs in one place. Since this database is lightweight and fast, I wrote a simple logger to store the messages in a MongoDB database instead of using the normal file logger.

After I got the logger working I needed a way to display and search through the logs. Initially I built a symfony application that was able to filter the logs by priority and by some words in the log message. The feeling I got was that a full symfony 1.2 project was too much for such a
simple web app. This was the perfect excuse to experiment with Symfony 2.

The Logger

The MongoDB logger is just a simple symfony logger that stores for every symfony log an array with this structure:

 

$log = array( 'type' => $type, 'message' => $message,
'time' => time(), 'priority' => $priority) );

The idea is to provide a form to issue queries to the database to filter the logs by any of those fields. i. e.: I type sfRouting and I should see only those logs that contain that word in their messages.

Here's a screen shot of the final application:

So, how to build that using Symfony 2?

First we create the folder structure like this:

 
-/
--/ apps
--/ config
--/ lib
--/ web

Inside web we place the index.php file which has the following content:


define('ROOT_PATH', dirname(__FILE__).'/..');

require_once ROOT_PATH . '/config/sf_requires.php';
require_once ROOT_PATH . '/config/app_requires.php';

$app = new LogAnalyzer(); $app->run()->send();

There we define the root path, and then we include two configuration files, one which will take care of requiring the Symfony libraries and the other that will require the application files.

Then we instantiate the application class and we run it.

Now let's check what's inside the sf_requires.php file:

define('SF_LIB_PATH', ROOT_PATH . '/lib/vendor/symfony/lib');

require_once SF_LIB_PATH . '/utils/sfToolkit.class.php';
require_once SF_LIB_PATH . '/utils/sfParameterHolder.class.php';
require_once SF_LIB_PATH . '/event_dispatcher/sfEventDispatcher.php';
require_once SF_LIB_PATH . '/request/sfRequestHandler.class.php';
require_once SF_LIB_PATH . '/request/sfRequest.class.php';
require_once SF_LIB_PATH . '/request/sfWebRequest.class.php';
require_once SF_LIB_PATH . '/response/sfResponse.class.php';
require_once SF_LIB_PATH . '/response/sfWebResponse.class.php';

First we define the location of the Symfony libraries and then we proceed to include the required files.

The only new class here is the sfRequestHandler which Fabien describes in his presentations, the other ones I just took from a symfony 1.3 distrubution.

With those files included, we are done with what refers to symfony, then we have to include the application files. So the contents of app_requires.php will be:

require_once ROOT_PATH . '/lib/dba/MongoLogReader.class.php';
require_once ROOT_PATH . '/lib/dba/CollectionModel.class.php';
require_once ROOT_PATH . '/apps/LogAnalyzer.class.php';

Besides the LogAnalyzer class we include two classes that will take care of querying the log database. As we can see, the LogAnalyzer class will reside under the apps folder and then others under lib/dba.

So now let's check what's inside the LogAnalyzer class.

 
public function __construct()
{
$this->dispatcher =
new sfEventDispatcher();
$this->dispatcher->connect('application.load_controller', array($this, 'loadController'));
}

On instantiation we create a new instance of the sfEventDispatcher and we connect our application to the application.loadController event which will be fired by the sfRequestHandler::handleRaw() method. There we tell it that the loadController method of our application will process the request.

Then we have the run method:

public function run()
{
$request = new sfWebRequest($this->dispatcher);
$handler = new sfRequestHandler($this->dispatcher);
$response = $handler->handle($request);
return $response;
}

Here we initialize a sfWebRequest object to start parsing the request parameters. Then we instantiate our sfRequestHandler and we call the handle method. The handle method will return a response object, which is the one where we call send() in the index.php file to output the response to the browser.

When the sfRequestHandler start to do it's job it will fire the application.load_controller event for which we set up the following listener:

public function loadController(sfEvent $event)
{
$event->setReturnValue(array(array($this, 'execute'), array($this->dispatcher, $event['request'])));
return true;
}

There we say that the method execute of the LogAnalyzer class will take care of generatiing the response data out of the request.

And finally the code for the execute method:

public function execute($dispatcher, $request)
{
$response = new sfWebResponse($dispatcher);
$response->setContent($this->render($this->getTemplateValues($request)));
return $response;
}

There we instantiate a sfWebResponse. The content of this one will be the result of the proteced method render. –Here I must say that is possible to create our own View class to
handle this part, but for this example I preffer to build it like this–.

protected function render($values)
{
extract($values);
ob_start();
ob_implicit_flush(0);
require(ROOT_PATH . '/apps/template.php');
return ob_get_clean();
}

This method expects an array with all the variables that will be used in the template. This values are extracted from the array and inserted in the current scope by calling the extract function. As we can see, the template is a simple php file that is required from the apps folder. This file is plain PHP code embedded into HTML.

The getTemplateValues method take will get the data out of the database, plus interpreting the request:

protected function getTemplateValues($request)
{
$values = array();
$values['sf_request'] = $request;
$values['collections'] = $this->getCollections();
$values['priorities'] = $this->getPriorities();
$values['cursor'] = $this->getLogs($request);
$values['pageNumber'] = $request->getParameter('page', 1);
$values['cursor']->skip(($values['pageNumber'] - 1) * $this->maxPerPage)->limit($this->maxPerPage);
$values['hasMore'] = $values['cursor']->count() > ($this->maxPerPage * $values['pageNumber']);
$values['filterParams'] = $this->buildFilterParams($request);
return $values;
}

And that's it! Which such a simple structure we can leverage the power of the sfRequestHandler class which will be at the core of the new Symfony version. We know that symfony does very well for complex projects, but sometimes I felt like it was too big for a simple application like this one. With this new component I think that this distinction will be gone.

THE CODE:

The application code and the sfMongoDBLogger class can be found here.

You will need to setup a virtual host in order to run this application.

RESOURCES/REQUIREMENTS:

To learn more on MongoDB refer to this website: http://www.mongodb.org
If you setup MongoDB as explained here http://www.mongodb.org/display/DOCS/Getting+Started, you should be able to run this project without problems –Give it a try, MongoDB is pretty easy to setup and the documentation is very good–
For the installation instructions of the PHP native driver go here

IMPORTANT:

Even if this should be implicit, keep in mind that this are my personal views on the subjects. This is by no means an official statement from the Symfony project. Is just what I believe this new component will be based on Fabien presentation.

Thursday, July 2, 2009

Benchmarking MongoDB VS. Mysql

EDIT November 2011: Please take this benchmarks with a grain of salt. They are completely non-scientific and when choosing a data store probably raw speed is not all you care about. Try to understand the trade-offs of each database, learn about how your data will be handled in case of losing a server. What happens if there's data corruption? How replication works in your specific database… and so on. After all… don't choose a database over another just because of this blogpost.

One of the projects I work for at the company has a message system with more than 200 million entries in a MySQL database. We've started to think about how can we scale this further. So in my research for alternatives look at nosql databases. In this post I would like to share some benchmarks I ran against Mongodb –a document oriented database– compared to MySQL.

To perform the tests I installed MongoDB as explained here. I also installed PHP, MySQL and Apache2 from Macports. I did no special configurations in any of them.

The hardware used for the tests is my good ol' Macbook White:

Model Name: MacBook
Model Identifier: MacBook2,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 4 MB
Memory: 4 GB
Bus Speed: 667 MHz
Boot ROM Version: MB21.00A5.B07
SMC Version (system): 1.13f3

Because I don't have enough space to store the MongoDB database in my local hardrive, I launched the server with this command:
./bin/mongod --dbpath '/Volumes/alvaro/data/db/'

which tells MongoDB to use my USB hardrive. YES, my USB hardrive :-P

The MySQL server stored the data in the local hardrive.

What was the test?

I loaded in both databases 2 million records from our real data of the message system. Every record has 28 columns, holding informatin about the sender of the message and the recipient, plus the subject, date, etc. For MySQL I used mysqldump. For MongoDB I used the following:

$query = "SELECT * from messsage";
$result = mysql_query($query);
while($row = mysql_fetch_assoc($result))
{
$collection->insert($row);
}

Of course that for the real data loading I added some paginations, I didn't retrieved 2M records at once. And there was some code to initialize the MySQL connection and Mongo to get that $collection object.
The MySQL databases had index on the sid and tid fields (sender id and target id), so I added them to the MongoDB database.
$m = new Mongo();
$collection = $m->selectDB("msg")->selectCollection("msg_old");
$collection->ensureIndex( array("sid", "tid") );

Then I wrote some simple code that will select a limit of 20 records filtered by sid. In the real application this means I'm watching the first 20 messages of my outbox.

EDIT - 2009/07/03

Due to some confusion I have to make something clear. What I'm benchmarking is not the data loading into both databases, nor traversing the data, etc., but the code that you can find here .

This is a similar case of what a user message outbox (or inbox) could be in a production website. The users access his inbox and we retrieve up to 20 messages of his inbox, which are then displayed in an html table. What siege accessed was a script serving that html generated out of the query results.

So the idea is, if MongoDB or MySQL are the backends of this message system, which one will be faster for this specific use case. This benchmark is not about if MongoDB is better than MySQL for every use case out there. We use MySQL a lot in production and we will keep using it as far as I can tell. And yes, I know that MySQL and MongoDB are two totally different technologies that probably only share the word database in their descriptions.

END EDIT - 2009/07/03

I did the code for Mongodb and MySQL. Then my idea was to launch siege and pick some random user ids from a text file and do the stress tests.

Here's an extract from the url textfile:
http://mongo.al/index.php?id=96
http://mongo.al/index.php?id=105
http://mongo.al/index.php?id=108
http://mongo.al/index.php?id=113
http://mongo.al/index.php?id=116
http://mongo.al/index.php?id=117
http://mongo.al/index.php?id=127
http://mongo.al/index.php?id=129
http://mongo.al/index.php?id=130
http://mongo.al/index.php?id=134

This means that siege will pick a random url and hit the server, requesting the outbox of that user id.

Then I increased the ulimit to be able to run this test:
siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

With that command I launch siege, telling it to load the urls to visit form the text file. It will simulate 300 concurrent users and will do 10 repetitions with a random delay between 0 and 1. The last option tells siege to work in internet mode, so it will pick urls randomly from the text file.

When I launched the test wit MongoDB as backend it worked without problems. With the MySQL it crashed quite often. Below I add the results I obtained for both of them.

MongoDB test results:

siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

Transactions: 2994 hits
Availability: 99.80 %
Elapsed time: 11.95 secs
Data transferred: 3.19 MB
Response time: 0.26 secs
Transaction rate: 250.54 trans/sec
Throughput: 0.27 MB/sec
Concurrency: 65.03
Successful transactions: 2994
Failed transactions: 6
Longest transaction: 1.47
Shortest transaction: 0.00

MySQL tets results:
siege -f ./stress_urls_mysql.txt -c 300 -r 10 -d1 -i

Transactions: 2832 hits
Availability: 94.40 %
Elapsed time: 23.53 secs
Data transferred: 2.59 MB
Response time: 0.74 secs
Transaction rate: 120.36 trans/sec
Throughput: 0.11 MB/sec
Concurrency: 89.43
Successful transactions: 2832
Failed transactions: 168
Longest transaction: 16.36
Shortest transaction: 0.00

As we can see, MongoDB performed more than 2X better than MySQL for this specific case. And remember, MongoDB was reading the data from my USB hardrive ;-).