Thursday, October 29, 2009

Writing an Erlang PubSub Client with Exmpp

I've been working on a system to monitor and debug web applications remotely. One of the goals of it is to receive notifications when something wrong has happened, i.e.: the load went wild, the database is overloaded, etc. While I won't be describing the whole system in this post, I'd like to present a component of it, which is a PubSub client that connects to a XMPP server to be notified of events.

In my case I've already set up an Ejabberd server, where I've configured a couple of PubSub nodes for testing purposes. In this tutorial we will see how to create an Erlang client to receive such notifications.

Regarding the PubSub service, you can read about it here. Basically you create a PubSub node, and then send notifications to it. Then those users who are interested in those notification can subscribe to the node and receive them. It works similar to Twitter, where you follow someone and then you receive his messages in your timeline.

Our client will be written in Erlang an will use the Exmpp library. Follow the instructions on their site to see how to install it.

NOTE: when I first started with Exmpp the code was constantly failing. After some research I found that the problem was caused by the compilation on Snow Leopard. In case you fall into the same problem, run the configure command like this:

CC='gcc -m32' CFLAGS=-m32 LDFLAGS=-m32 ./configure

I got that trick from here which also works with Exmpp.

First we create the folder pubsub_client and inside we set up the structure for our project. We will follow the recommendations found here:

It will look like this:

pubsub_client/
- ebin/
- include/
- priv/
- src/

At the root of the project we will add a Makefile –the code for it is provided at the end of the post– and this shell script that will launch the Erlang Console:

#!/bin/sh
cd `dirname $0`
exec erl -pa $PWD/ebin -boot start_sasl -s exmpp

There we tell the Erlang environment to add the compiled files found inside ebin to the code load path and then it will start the exmpp application, which is required in order to use the functions from that library.

Our client will be built around the gen_server behavior. When launched it will connect to the Ejabberd server and wait for notifications. When they arrive we will print them to the tty. Also we will add a function that will let us subscribe to PubSub nodes.

To start lets create a file called pubsub_client.erl inside the src folder. Then we add the following content to it:

-module(pubsub_client).

-behaviour(gen_server).

-export([start/4, start_link/4, stop/0]).

%% gen_server callbacks
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
terminate/2, code_change/3]).

-include_lib("exmpp/include/exmpp.hrl").
-include_lib("exmpp/include/exmpp_client.hrl").

-record(state, {session, jid}).

There we defined our module name: pubsub_client and we declared it to be a gen_server behavior. Then we export some functions to start and stop the client. Below are the exports of the required gen_server callbacks and we include the exmpp.hrl and exmpp_client.hrl headers, because we will need some macros that are defined there.

The last line defines a record called state, which will hold our XMPP session and our JID.

Lets write now the init/1 function where we will initialize our connection to the server.

init({Host, Port, User, Password}) ->
{ok, {MySession, MyJID}} = pubsub_utils:connect(Host, Port, User, Password),
{ok, #state{session=MySession, jid=MyJID}}.

This function expects a four element tuple with the parameters to use for the connection, that is, server Host and Port, the User name and the Password. Those parameters are passed to the pubsub_utils:connect/4 function, which will return the term {ok, Session, JID} on success. If everything its OK, our init function will return {ok, State} where State is the record that we defined before, holding our Session and our JID.

Then we have to add the functions that will take care of starting and stopping the system:

start(Host, Port, User, Password) ->
gen_server:start({local, ?MODULE}, ?MODULE, {Host, Port, User, Password}, []).

start_link(Host, Port, User, Password) ->
gen_server:start_link({local, ?MODULE}, ?MODULE, {Host, Port, User, Password}, []).
stop() ->
gen_server:cast(?MODULE, stop).

The functions start/4 and start_link/4 both expects four parameters, which will we passed to the init function. The only difference is that start_link/4 will link the client to our current process. You can read more about them and the whole gen_server behavior here.

Then with stop/1 we send and asynchronous call to our module to tell it to stop. Asynchronous calls are sent using gen_server:cast/2 and they will be processed by the handle_cast/2 module callbacks. This are our handle_cast/2 implementations:

handle_cast(stop, State) -> {stop, normal, State};
handle_cast(_Msg, State) -> {noreply, State}.

In the first one we expect only the stop message while the second one is some sort of catch all handler. Because the return of the first one is {stop, normal, State} our client will receive a terminate message, which we handle in the following function:

terminate(_Reason, #state{session=MySession}) ->
pubsub_utils:disconnect(MySession),
ok.

As you can see there, first we call pubsub_utils:disconnect/1 passing our session identifier as parameter to close our XMPP connection and the we return ok.

NOTE: In the link provided at the end, we can find the source of the pubsub_utils module along with the whole project. You can read more about the implementation of the connection method here.

So far we can connect and disconnect from the Ejabberd server. Now lets write the function that will handle the notifications.

When a notification is received, the Exmpp library will send a message with it to our gen_server process. When a gen_server process receives a message that was not generateed by a gen_server:cast or gen_server:call –an their similar functions– it will processed by the handle_info/2 handler. This handler expects two parameters, Info and State, where Info is the message received which in our case willb the XMPP packet.

Here's our implementation:

handle_info(#received_packet{packet_type='message'}=Packet, State) ->
process_received_packet(Packet, Fun),
{noreply, State};
handle_info(_Info, State) ->
{noreply, State}.

NOTE: We will talk later about the unbound variable Fun that you see there

The first one expects a received_packet record as message. That record is defined inside the exmpp_client.hrl and is used by the Exmpp library to deliver messages. The second handle_info/2 handler will work as a catch all handler.

In our case the packet must be of the message type, which we specify in our pattern matching declaration. If so we delegate the processing to the process_received_packet/2 function.

Now is time to process the notification, which in our case means printing them to the tty. Here's the code:

process_received_packet(#received_packet{raw_packet=Raw}, Fun) ->
Event = exmpp_xml:get_element(Raw, ?NS_PUBSUB_EVENT, 'event'),
Items = exmpp_xml:get_element(Event, 'items'),
exmpp_xml:foreach(Fun, Items),
ok.

To accomplish this we have to parse the incoming packet. Our process_received_packet/2 function will take care of that. In its declaration we extract the Raw packet via pattern matching. Then first we use the exmpp_xml:get_element/3 function to get the received PubSub event. This helper function from the Exmpp library expects the XML element from where we will extract the child node, the namespace the child should have -?NS_PUBSUB_EVENT in our case– and the name of the XML element of the child node. In our example we want an event node.

Then from that XML element we extract the Items, which is what we are interested in. The items node will have one or several children containing the notification we expect inside an item node –note the plural difference–.

Once we have the Items we pass them to the exmpp_xml:foreach/2 function along with the Fun parameter. The exmpp_xml:foreach/2 will iterate over the child elements of Items and will apply to them the anonymous function contained in Fun. In case you need it, the first argument passed to the anonymous function will be the original XML element. For this to work we have to pass this Fun to to process_received_packet/2.

Let's declare that fun inside the init/1 function and add them to the state of the process. This will be our new init/1 function:

init({Host, Port, User, Password}) ->
{ok, {MySession, MyJID}} = pubsub_utils:connect(Host, Port, User, Password),
Fun = fun(_XML_Element, Child) ->
case exmpp_xml:get_element(Child, 'log') of
undefined -> not_a_notification;
Log ->
io:format("Notification: ~s~n", [exmpp_xml:get_cdata_as_list(Log)])
end
end,
{ok, #state{session=MySession, jid=MyJID, on_message=Fun}}.

Our fun will take two parameters as we discussed and it will print to the tty the log using io:format/2. There's some XML processing in place there, to extract our log text out. This fun will be added to our process State inside the new on_message field of our state record, which we will have to modify for this to work:

-record(state, {session, jid, on_message}).

Once we have that code in place we change the header of our handle_info/2 function so we can extract the Fun out of the state record:

handle_info(#received_packet{packet_type='message'}=Packet, #state{on_message=Fun}=State) ->

The final piece of our puzzle is to actually subscribe to a node, here's what we can do. We will add a function to our process API that will do this for us:

subscribe(Service, Node) ->
gen_server:call(?MODULE, {subscribe, Service, Node}).

And we export it adding the following line after the gen_server export callbacks:

-export([subscribe/2]).

Our function subscribe/2 will send a synchronous call to our process passing two parameters. The PubSub service Name and the Node. The next step is to implement the handle_call/3 callback:

handle_call({subscribe, Service, Node}, _From, #state{session=MySession, jid=MyJID}=State) ->
IQ = exmpp_client_pubsub:subscribe(exmpp_jid:to_list(MyJID), Service, Node),
PacketId = exmpp_session:send_packet(MySession, exmpp_stanza:set_sender(IQ, MyJID)),
PacketId2 = erlang:binary_to_list(PacketId),
Reply =
receive
#received_packet{id=PacketId2, raw_packet=Raw} ->
case exmpp_iq:is_error(Raw) of
true -> error;
_ -> ok
end
end,
{reply, Reply, State};

There we use the helper function from the Exmpp library to generate an IQ stanza that will be sent to the XMPP server to subscribe our user. The interesting part of that code is the receive block. Let's review it. When we send an IQ packet to the server it will reply to us with another IQ packet. In order to track the IQ requests and responses, the XMPP protocol adds an id attribute to the stanza. That's why we only wait for messages sent to our process that contain that packet id. You may be wondering why we have the receive block here. We want to know if our subscription succeeded or if it failed. If we don't have any receive block there, then the reply will be handled by the handle_info/2 function. There will be quite hard to track the IQs ids and match them. In our simple case could be easy, but in a more complex scenario we may encounter problems when our process is receiving several messages concurrently.

To try our module we can do the following sequence at the command line:

cd /path/to/our/project
make
./start-dev.sh

There we compile the code and launch the Erlang console. Then inside the console we input the following:

pubsub_publisher:start("localhost", 5222, "publisher", "password").
pubsub_publisher:create_node("pubsub.localhost", "logs").

Then we can open a new terminal window an do the following
cd /path/to/our/project
./start-dev.sh
pubsub_client:start("localhost", 5222, "tutorial", "password").
pubsub_client:subscribe("pubsub.localhost", "logs").

Then we can go back to the first window and issue this command:

pubsub_publisher:send_message("pubsub.localhost", "some notification", "logs").

If everything worked well, then we should see the message being displayed where the pubsub_client is running.

BONUS: Display the log message as a Growl notification

You have the right to wonder why in the code above we pass the anonymous function around in the process state. I did that, because later we can hook a new function to process the notifications, say, to send them directly to Growl :) –for the non Mac users, Growl is a free application for Mac that can display notifications on our screen more info here–.

If you review the code provided at the end of this post, there's a function pubsub_client:use_growl/0 that when called will swap the log handler and instead of displaying the notifications in the tty, it will forward them to Growl. Since the implementation is easy to understand I won't explain it here.

Thanks for reading this far and I hope this tutorial will push you to create some nice erlang applications using Exmpp and PubSub.

PubSub client Code.

Tuesday, September 1, 2009

Running Haskell GHC on Snow Leopard

After so much time spent trying to get a working GHC in my Mac with Snow Leopard I could arrive to the solution. Here it is, pretty simple:

1 - Download Haskell Platform for Mac: http://hackage.haskell.org/platform/
2 - Install GHC and the Haskell Platform that are inside the .dmg.
3 - Patch your ghc compiler to produce 32 bits code adding the following options:
-optc-m32 -opta-m32 -optl-m32
The ghc script that you have to patch is located here: /usr/bin/ghc
Open it with your favorite text editor and add those options. This at least will let you install new packages and compile your code with no problems until the Haskell team release a new version of GHC for mac.
Most of the tricks for this I got them from here: http://www.nabble.com/Snow-Leopard-Breaks-GHC-td25198347.html

Monday, July 6, 2009

My Guess on Symfony 2

After I read this tweet from Fabien I was left thinking on how Symfony 2 will be. Then I
remembered this presentation from Fabien where he talks about the new framework. I took a look at it and then I decide to glue the pieces together to get something working out of the
code given in slide 28.

So to start with it, I needed an application to build. Since some days ago I'm playing with MongoDB, a document oriented database. To learn how to use it I built a centralized logger for symfony applications. The idea is that if we have in production twenty machines serving symfony and then we need to parse the logs to find possible errors, etc., it will be nice to have a tool that centralizes the logs in one place. Since this database is lightweight and fast, I wrote a simple logger to store the messages in a MongoDB database instead of using the normal file logger.

After I got the logger working I needed a way to display and search through the logs. Initially I built a symfony application that was able to filter the logs by priority and by some words in the log message. The feeling I got was that a full symfony 1.2 project was too much for such a
simple web app. This was the perfect excuse to experiment with Symfony 2.

The Logger

The MongoDB logger is just a simple symfony logger that stores for every symfony log an array with this structure:

 

$log = array( 'type' => $type, 'message' => $message,
'time' => time(), 'priority' => $priority) );

The idea is to provide a form to issue queries to the database to filter the logs by any of those fields. i. e.: I type sfRouting and I should see only those logs that contain that word in their messages.

Here's a screen shot of the final application:

So, how to build that using Symfony 2?

First we create the folder structure like this:

 
-/
--/ apps
--/ config
--/ lib
--/ web

Inside web we place the index.php file which has the following content:


define('ROOT_PATH', dirname(__FILE__).'/..');

require_once ROOT_PATH . '/config/sf_requires.php';
require_once ROOT_PATH . '/config/app_requires.php';

$app = new LogAnalyzer(); $app->run()->send();

There we define the root path, and then we include two configuration files, one which will take care of requiring the Symfony libraries and the other that will require the application files.

Then we instantiate the application class and we run it.

Now let's check what's inside the sf_requires.php file:

define('SF_LIB_PATH', ROOT_PATH . '/lib/vendor/symfony/lib');

require_once SF_LIB_PATH . '/utils/sfToolkit.class.php';
require_once SF_LIB_PATH . '/utils/sfParameterHolder.class.php';
require_once SF_LIB_PATH . '/event_dispatcher/sfEventDispatcher.php';
require_once SF_LIB_PATH . '/request/sfRequestHandler.class.php';
require_once SF_LIB_PATH . '/request/sfRequest.class.php';
require_once SF_LIB_PATH . '/request/sfWebRequest.class.php';
require_once SF_LIB_PATH . '/response/sfResponse.class.php';
require_once SF_LIB_PATH . '/response/sfWebResponse.class.php';

First we define the location of the Symfony libraries and then we proceed to include the required files.

The only new class here is the sfRequestHandler which Fabien describes in his presentations, the other ones I just took from a symfony 1.3 distrubution.

With those files included, we are done with what refers to symfony, then we have to include the application files. So the contents of app_requires.php will be:

require_once ROOT_PATH . '/lib/dba/MongoLogReader.class.php';
require_once ROOT_PATH . '/lib/dba/CollectionModel.class.php';
require_once ROOT_PATH . '/apps/LogAnalyzer.class.php';

Besides the LogAnalyzer class we include two classes that will take care of querying the log database. As we can see, the LogAnalyzer class will reside under the apps folder and then others under lib/dba.

So now let's check what's inside the LogAnalyzer class.

 
public function __construct()
{
$this->dispatcher =
new sfEventDispatcher();
$this->dispatcher->connect('application.load_controller', array($this, 'loadController'));
}

On instantiation we create a new instance of the sfEventDispatcher and we connect our application to the application.loadController event which will be fired by the sfRequestHandler::handleRaw() method. There we tell it that the loadController method of our application will process the request.

Then we have the run method:

public function run()
{
$request = new sfWebRequest($this->dispatcher);
$handler = new sfRequestHandler($this->dispatcher);
$response = $handler->handle($request);
return $response;
}

Here we initialize a sfWebRequest object to start parsing the request parameters. Then we instantiate our sfRequestHandler and we call the handle method. The handle method will return a response object, which is the one where we call send() in the index.php file to output the response to the browser.

When the sfRequestHandler start to do it's job it will fire the application.load_controller event for which we set up the following listener:

public function loadController(sfEvent $event)
{
$event->setReturnValue(array(array($this, 'execute'), array($this->dispatcher, $event['request'])));
return true;
}

There we say that the method execute of the LogAnalyzer class will take care of generatiing the response data out of the request.

And finally the code for the execute method:

public function execute($dispatcher, $request)
{
$response = new sfWebResponse($dispatcher);
$response->setContent($this->render($this->getTemplateValues($request)));
return $response;
}

There we instantiate a sfWebResponse. The content of this one will be the result of the proteced method render. –Here I must say that is possible to create our own View class to
handle this part, but for this example I preffer to build it like this–.

protected function render($values)
{
extract($values);
ob_start();
ob_implicit_flush(0);
require(ROOT_PATH . '/apps/template.php');
return ob_get_clean();
}

This method expects an array with all the variables that will be used in the template. This values are extracted from the array and inserted in the current scope by calling the extract function. As we can see, the template is a simple php file that is required from the apps folder. This file is plain PHP code embedded into HTML.

The getTemplateValues method take will get the data out of the database, plus interpreting the request:

protected function getTemplateValues($request)
{
$values = array();
$values['sf_request'] = $request;
$values['collections'] = $this->getCollections();
$values['priorities'] = $this->getPriorities();
$values['cursor'] = $this->getLogs($request);
$values['pageNumber'] = $request->getParameter('page', 1);
$values['cursor']->skip(($values['pageNumber'] - 1) * $this->maxPerPage)->limit($this->maxPerPage);
$values['hasMore'] = $values['cursor']->count() > ($this->maxPerPage * $values['pageNumber']);
$values['filterParams'] = $this->buildFilterParams($request);
return $values;
}

And that's it! Which such a simple structure we can leverage the power of the sfRequestHandler class which will be at the core of the new Symfony version. We know that symfony does very well for complex projects, but sometimes I felt like it was too big for a simple application like this one. With this new component I think that this distinction will be gone.

THE CODE:

The application code and the sfMongoDBLogger class can be found here.

You will need to setup a virtual host in order to run this application.

RESOURCES/REQUIREMENTS:

To learn more on MongoDB refer to this website: http://www.mongodb.org
If you setup MongoDB as explained here http://www.mongodb.org/display/DOCS/Getting+Started, you should be able to run this project without problems –Give it a try, MongoDB is pretty easy to setup and the documentation is very good–
For the installation instructions of the PHP native driver go here

IMPORTANT:

Even if this should be implicit, keep in mind that this are my personal views on the subjects. This is by no means an official statement from the Symfony project. Is just what I believe this new component will be based on Fabien presentation.

Thursday, July 2, 2009

Benchmarking MongoDB VS. Mysql

One of the projects I work for at the company has a message system with more than 200 million entries in a MySQL database. We've started to think about how can we scale this further. So in my research for alternatives look at nosql databases. In this post I would like to share some benchmarks I ran against Mongodb –a document oriented database– compared to MySQL.

To perform the tests I installed MongoDB as explained here. I also installed PHP, MySQL and Apache2 from Macports. I did no special configurations in any of them.

The hardware used for the tests is my good ol' Macbook White:

Model Name: MacBook
Model Identifier: MacBook2,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache: 4 MB
Memory: 4 GB
Bus Speed: 667 MHz
Boot ROM Version: MB21.00A5.B07
SMC Version (system): 1.13f3

Because I don't have enough space to store the MongoDB database in my local hardrive, I launched the server with this command:
./bin/mongod --dbpath '/Volumes/alvaro/data/db/'

which tells MongoDB to use my USB hardrive. YES, my USB hardrive :-P

The MySQL server stored the data in the local hardrive.

What was the test?

I loaded in both databases 2 million records from our real data of the message system. Every record has 28 columns, holding informatin about the sender of the message and the recipient, plus the subject, date, etc. For MySQL I used mysqldump. For MongoDB I used the following:

$query = "SELECT * from messsage";
$result = mysql_query($query);
while($row = mysql_fetch_assoc($result))
{
$collection->insert($row);
}

Of course that for the real data loading I added some paginations, I didn't retrieved 2M records at once. And there was some code to initialize the MySQL connection and Mongo to get that $collection object.
The MySQL databases had index on the sid and tid fields (sender id and target id), so I added them to the MongoDB database.
$m = new Mongo();
$collection = $m->selectDB("msg")->selectCollection("msg_old");
$collection->ensureIndex( array("sid", "tid") );

Then I wrote some simple code that will select a limit of 20 records filtered by sid. In the real application this means I'm watching the first 20 messages of my outbox.

EDIT - 2009/07/03

Due to some confusion I have to make something clear. What I'm benchmarking is not the data loading into both databases, nor traversing the data, etc., but the code that you can find here .

This is a similar case of what a user message outbox (or inbox) could be in a production website. The users access his inbox and we retrieve up to 20 messages of his inbox, which are then displayed in an html table. What siege accessed was a script serving that html generated out of the query results.

So the idea is, if MongoDB or MySQL are the backends of this message system, which one will be faster for this specific use case. This benchmark is not about if MongoDB is better than MySQL for every use case out there. We use MySQL a lot in production and we will keep using it as far as I can tell. And yes, I know that MySQL and MongoDB are two totally different technologies that probably only share the word database in their descriptions.

END EDIT - 2009/07/03

I did the code for Mongodb and MySQL. Then my idea was to launch siege and pick some random user ids from a text file and do the stress tests.

Here's an extract from the url textfile:
http://mongo.al/index.php?id=96
http://mongo.al/index.php?id=105
http://mongo.al/index.php?id=108
http://mongo.al/index.php?id=113
http://mongo.al/index.php?id=116
http://mongo.al/index.php?id=117
http://mongo.al/index.php?id=127
http://mongo.al/index.php?id=129
http://mongo.al/index.php?id=130
http://mongo.al/index.php?id=134

This means that siege will pick a random url and hit the server, requesting the outbox of that user id.

Then I increased the ulimit to be able to run this test:
siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

With that command I launch siege, telling it to load the urls to visit form the text file. It will simulate 300 concurrent users and will do 10 repetitions with a random delay between 0 and 1. The last option tells siege to work in internet mode, so it will pick urls randomly from the text file.

When I launched the test wit MongoDB as backend it worked without problems. With the MySQL it crashed quite often. Below I add the results I obtained for both of them.

MongoDB test results:

siege -f ./stress_urls.txt -c 300 -r 10 -d1 -i

Transactions: 2994 hits
Availability: 99.80 %
Elapsed time: 11.95 secs
Data transferred: 3.19 MB
Response time: 0.26 secs
Transaction rate: 250.54 trans/sec
Throughput: 0.27 MB/sec
Concurrency: 65.03
Successful transactions: 2994
Failed transactions: 6
Longest transaction: 1.47
Shortest transaction: 0.00

MySQL tets results:
siege -f ./stress_urls_mysql.txt -c 300 -r 10 -d1 -i

Transactions: 2832 hits
Availability: 94.40 %
Elapsed time: 23.53 secs
Data transferred: 2.59 MB
Response time: 0.74 secs
Transaction rate: 120.36 trans/sec
Throughput: 0.11 MB/sec
Concurrency: 89.43
Successful transactions: 2832
Failed transactions: 168
Longest transaction: 16.36
Shortest transaction: 0.00

As we can see, MongoDB performed more than 2X better than MySQL for this specific case. And remember, MongoDB was reading the data from my USB hardrive ;-).

Thursday, June 4, 2009

New Firesymfony Release

I'm pleased to announce the release of version 1.1 of Firesymfony. This time it has a new design that we believe improves the user experience. The design was made by my colleague Jacqueline Wan and the logo by Olaf Horstmann.

Bellow you can see some screenshots and here you have the urls to update the symfony plugins and the Firebug extension:

Symfony Plugin

FB Extension

About Panel:

Configuration and Variables Panel:


Logs Panel:

Cache Panel:

Database Panel:

Timers Panel:

Information Panel:

Monday, March 30, 2009

How I would like to use Propel and Memcache

In this article I will like to share some ideas that are wandering in my mind but I haven't implemented yet. So beware!

I was thinking of a way to reduce database load by caching some results inside Memcache -idea which has nothing special or revolutionary this days-. This article gave me some ideas that I would like to see implemented in some of the projects I work for.

The code is for using mostly inside a Symfony/Propel project, but could be adapted to a different one with ease.

Propel Peer classes come packed with the following method.

BaseUserPeer::retrieveByPK($pk, $con=null);

I would like to override it in the following way:

public static function retrieveByPK($pk, $con = null)
{
  $cacheKey = sprintf('user:id:%d', $pk);
  $asArray = $memcache->get($cacheKey);
  
  if($asArray === null)
  {
    $obj = parent::retrieveByPK($pk, $con);
    if($obj !== null)
    {
      $memcache->set($cacheKey, $obj->toArray(BasePeer::TYPE_FIELDNAME));
    }
  }
  else
  {
    $obj = new User();
    $obj->fromArray($asArray, BasePeer::TYPE_FIELDNAME);
  }
  
  return $obj;
}

Note that I've avoided the memcache connection code. I assume that there is a memcache class that abstracts the process. I'm also avoided the details of this class instantiation.

As you can see from the code, I store the object in memcache as an associative array. I like to do so because I don't want to store a serialized version of the Propel object which will take more space. Also by using a native data type I can warm up the cache from -let's say- a batch script without the need of using Propel at all. Also, if a I have code that don't require symfony or Propel, I can still use the cached data. 

The fromArray and toArray methods are built inside propel objects as a convenient way of populating them, so there's no extra effort in our side to get their benefits. 

As key for the cache I'm using as prefix the name of the table, followed by a colon and then the primary key column name, a colon and the primary value. i. e.: table_name:primary_key:value

Then what is left for this to actually work is to override the User::save($con = null) method, so every time the database row is updated the changes will be reflected in the cache.

I came up with the following code:

 
public function save($con = null)
{
  $affectedRows = parent::save();
  $memcache->set(sprintf('user:id:%d', $this->getId()), $this->toArray(BasePeer::TYPE_FIELDNAME));
  return $affectedRows;
}

There by updating the entry after it's saved to the database I used a traditional pattern with memcache to warm up the cache.

So lets say that in a normal login form, the user will submit his nickname and password to be checked against the database. In this case we have to do a SELECT query
using the nickname and password as WHERE parameters. Instead of issuing a query, I would like to do the following inside UserPeer::retrieveByNickname($nickname).

public static function retrieveByNickname($nickname)
{
  $nicknameCacheKey = sprintf('user:nickname:%s', $nickname);
  $userId = $memcache->get($nicknameCacheKey);
  
  if($userId !== null)
  {
    return UserPeer::retrieveByPK($userId);
  }
  else
  {
    $c = new Criteria();
    $c->add(UserPeer::NICKNAME, $nickname);
    $user = UserPeer::doSelectOne($c);
    if($user !== null)
    {
      $memcache->set(sprintf('user:id:%d', $user->getId()), $user->toArray(BasePeer::TYPE_FIELDNAME));
      $memcache->set($nicknameCacheKey, $user->getId());
    }
    
    return $user;
  }
}

In the last example first I check if there is a memcache entry with the following key: user:nickname:somenickname. The value stored will be the user id which I assume
is the primary key of the table. If the user id is not null then I delegate the call to UserPeer::retrieveByPK to do the job. In the other case I fetch the user from
the database using the nickname as Criteria condition. If the record exists I store inside memcache the user data as an array and also I store the id using the
user:nickname:somenickname key. Now should be clear why in the previous example I used used:id:somenid as key.

Some improvements to do in the retrieveByPK method and in the save method will be to also store the respective values for the user:nickname:somenickname key once
the object has been populated. In this way we increase the chances that retrieveByNickname will successfully hit the memcache.

I hope this article results useful for you and thanks for reading.

NOTE: I know that the code is pretty ugly, some code needs to be refactored out to it's own methods and maybe Propel classes are not the best place for
this caching logic to reside, but I think is a nice example to build upon.

Tuesday, March 3, 2009

Symfony Speed and Hello World Benchmarks

After reading some posts showing that my blah blah framework is way more fast than symfony for a Hello World application I decided to explain why: because symfony is extensible and can adapt to your needs. That’s easy to say you may think, in fact, every framework out there claims that. So what makes symfony so special?


The following list names some of the features provided by symfony.

  • Factories
  • The Filter Chain 
  • The Configuration Cascade
  • The Plugins System
  • Controller adaptability
  • View adaptability


Factories


Since version 1.0 symfony provides a configuration file called factories.yml. This file affects the application core classes configuration. There you can override symfony default classes by your own ones. This means you can set up a custom Front Controller, Web Request, Cache classes, Session storage, etc. 


But this come with an extra price: when a symfony application bootstraps, it reads the configuration file from the filesystem. If the yml file was parsed before, then it loads a PHP file -which can also be cached with APC-, if not, then it parses the yml file, stores the parsed file on the cache folder and loads the configuration from there.


Why should I need that flexibility you may ask? In a project I’m involved with we needed Memcached. This means that we overrode all of the symfony default cache mechanism by our own custom classes. How? Setting up our classes in an easy to read yml file. So for sure when you benchmark your Hello World application symfony will be slower.


Filter Chain


One of the patterns from the Core J2EE Patterns book that impressed me the most is the Intercepting Filter. This patterns teach how to modify a request processing without the need to change Controllers or Model code. The idea is that in a configuration file you plug a class that will take care of pre or post processing the request. This classes are called filters. As an example, you can add a filter that checks if the user has the proper credentials to execute the action she wants. Another filter can cache the response, etc. 


Symfony has the filters.yml file which can be specified by application or by module. This means that we can set filters to be executed for the whole application, and then for specific modules -let’s say for Ajax actions-, we disable them. Does your framework provides this flexibility without resorting to some kind of monkey patching techniques? No? Well symfony does. Say hello to the Filter Chain. So for sure when you benchmark your Hello World application symfony will be slower. Because it adds flexibility to the process. You want to get rid of this behavior? Sure, set up a custom controller in the factories.yml file, and in your new class override the loadFilters method.


Configuration Cascade


As explained in the configuration chapter of the symfony book, symfony allows to modify it’s behavior through some yml files. So for example we have the view.yml that tells symfony which css and javascript files it should load for the current request. We can have an application view.yml configuration and then override the settings per module. When symfony process the request it checks all this files, that’s why in the Hello World benchmark it‘s slow.


Plugins System


Symfony has a very powerful and easy to use Plugin System. It’s more than 400 plugins with a set of 200+ developers speaks by itself of it success. 


The plugins can contain modules of their own and also a config.php file, similar to the project config.php file or the module config.php. When symfony process a request it checks for the settings in those files, this means that they will be read from disk. So in a plugin we can provide a custom logger that is fired up when bootstrapping the application. The Plugin user doesn’t need to care how the logger will be activated, she just now that it will work. 


The same applies for plugin modules. How do you think that symfony knows that certain module/action should be called from a plugin? If you enable the plugin module on the settings.yml file symfony will check inside the plugin module to see if the requested action should be executed there. 


Controller adaptability


For each action that the user call symfony will execute a page controller. Inside our modules symfony let us use a generic myModuleActions class that will extend sfActions or one specific to the action requested by the user, that as an example could be called indexAction and will extend the sfAction class. When a request is processed symfony first checks for the existence of the later. If it doesn’t exists then it tries to load the generic action for that module. Of course you don’t need this kind of flexibility for Hello World apps.


View Adaptability


For rendering the response symfony uses by default the sfPHPView class. If certain module in your application requires a different view, then there are at least three ways to accomplish this as explained here.


Conclusion


Symfony is a Professional Web Application Framework built to cope with real world needs. In a large project with more than a simple salutation feature sooner or later you will need the flexibility provided by the framework. This will save you time and will prevent headaches, because when you have built a whole system with a framework and the business needs start to push in a direction where you have to extend the framework you will thank yourself for having choose symfony at first.

In case that your client requires a Hello World! application, then you can use the following hyper fast framework code: die(“Hello World!”) ;-)