Monkey patching in PHP

Tuesday, June 22. 2010
Comments

I haven't really had the chance or time to play with PHP 5.3 until recently when Ubuntu 10.04 upgraded my local installations and kind of forced me to dive into it a little. And I'm also probably the last person on the planet to notice, but namespaces in PHP 5.3 allow you to monkey-patch core PHP code.

What's monkey patching?

So monkey patching is a technique to replace functions at runtime. One of the more common applications is stubbing (or mocking) code in unit tests. So for example mocking the response from a server allows you to run a unit test in absence of another external service. Thus making your test suite both more robust and possible bugs easier to squash.

Up until 5.3 monkey patching was not available in PHP — unless you used the runkit extension.

Other use cases are changing the behavior of code without directly forking it and maintaining a local copy, e.g. to add a feature or so or even to apply bug fixes without modifying the original code.

Example

Here's some example code.

<?php
namespace monkeypatch;

$str = 'your mom';

echo "This should eight, but it's not: " . strlen($str) . "\n"; // 6
echo "Now this should be really eight: " . \strlen($str) . "\n";

function strlen($str) {
    return 6;
}

The difference between strlen() and \strlen() is, that the first call uses the function we defined in the current namespace. Since using the namespace operator requires it to be the first thing in a file, all consecutive functions and classes are part of this namespace. If an equivalent is not available in the current namespace, it'll fall back to the parent namespace or root.

Other applications that come to mind would be fixing the parameter order in strstr() and in_array(), and similar! But of course I'm kidding and wouldn't suggest that really. :-)

Fin

That's all kids!

Defined tags for this entry: , ,

PHP, APC and sessions

Wednesday, May 26. 2010
Comments

Playing with redis/Rediska and sessions, I wanted to get more numbers to compare this solution to a traditional MySQL-based approach which also made me revisit the idea of a CouchDB-based session handler for Zend_Session.

Implementing this handler, I ran into a weird issue:

Fatal error: Undefined class constant 'ALLOW_ALL' in /usr/home/till/foo/trunk/library/Zend/Uri/Http.php on line 447
Call Stack
#   Time    Memory  Function    Location
1   0.7357  3914816 Foo_Session_SaveHandler_Couchdb->write( )   ../Couchdb.php:0
2   0.7358  3916600 Foo_Couchdb->query( )   ../Couchdb.php:94
3   0.7361  3969464 Zend_Http_Client->__construct( )    ../Couchdb.php:368
4   0.7361  3969544 Zend_Http_Client->setUri( ) ../Client.php:250
5   0.7362  3976568 Zend_Uri::factory( )    ../Client.php:267
6   0.7365  4003352 Zend_Uri_Http->__construct( )   ../Uri.php:130
7   0.7367  4006216 Zend_Uri_Http->valid( ) ../Http.php:154
8   0.7368  4006216 Zend_Uri_Http->validateHost( )  ../Http.php:281

The funny thing is that that APC was added (for apc_store() and apc_fetch()) at the same time to the game (to cache the configuration) and when I disabled it, the error disappeared.

Talking to to one of the leads of APCGopal (Btw, cheers for helping!) — on IRC (#pecl@efnet) I thought at first that the issue was autoload related and we thought the order in which the extensions are loaded might make a difference. From Rasmus' comment, I later discovered bug #16745 with a proposed workaround to use session_write_close().

On a sidenote: I'm still not sure why the error is expected behavior for some people but yet it works with some PHP and APC versions and breaks with others. From what I gathered it broke for me with 5.2.6, 5.2.11 and 5.3.2. Tried all with the latest version of APC (3.1.3p1).

Here's how I fixed it for myself

I have a Lagged_Application class to bootstrap my application. Lagged_Application is kind of like Zend_Application sans a lot of safety nets and magic. Since it does a lot less, it's also quiet a bit faster. To get an idea, check out my Google Code repository (for an alas rather outdated version of it).

I added the following function to it:

<?php
// (...)
public function shutdown()
{
    session_write_close();
}

My index.php looks like the following:

<?php
include 'library/Lagged/Application.php';
$app = new Lagged_Application;
$app->setEnvironment('production');
$app->bootstrap();

register_shutdown_function(array($app, 'shutdown'));

Somewhat related — shutdown() could be a good start to tear down other objects as well, when needed.

More?

Now that this issue is fixed, I think also the infamous Fatal error: Exception thrown without a stack frame in Unknown on line 0 originates from the same issue. That is, when sessions and APC are around — but I should dig a little deeper to verify this.

All in all, it's a pretty weird issue and IM(very)HO, objects shouldn't be torn down or some sort of before hook should be executed to avoid this. But that's especially easy to say if you don't do C. :-)

Fin

That's all. I sure hope this saves someone else some time.

Defined tags for this entry: , , , , ,

PHP: So you'd like to migrate from MySQL to CouchDB? - Part III

Monday, May 17. 2010
Comments

This is part three of a beginner series for people with a MySQL/PHP background. Apologies for the delay, this blog entry has been in draft since the 13th December of last year (2009).

Follow these links for the previous parts:

Recap

Part I introduced the CouchDB basics which included basic requests using PHP and cURL. Part II focused on create, read, update and delete operations in CouchDB. I also introduced my nifty PHP CouchDB called ArmChair!

ArmChair is my own very simple and (hopefully) easy-to-use approach to accessing CouchDB from PHP. The objective is to develop it with each part of this series to make it a more comprehensive solution.

Part III

Part three will target basic view functions in CouchDB — think of views as a WHERE-clause in MySQL. They are similar, but also not. :-)

Map-Reduce-Thingy

If you read up on CouchDB before coming to this blog, you will probably heard of map-reduce. There, or maybe elsewhere. A lot of people attribute Google's success to map-reduce. Because they are able to process a lot of data in parallel (across multiple cores and/or machines) in relatively little time.

I guess the PageRank in Google Search or Google Analytics are examples of where it could be used.

In the following, I'll try to explain what map-reduce is. For people without a science degree. (And that includes me!)

Map

Generally, map-reduce is a way to process data. It's made off two things, map and reduce.

The idea is that the map-function is very robust and it allows data to be broken up into smaller pieces so it can be processed in parallel. In most cases the order data is processed in doesn't really matter. What generally counts is that it is processed at all. And since map allows us to run the processing in parallel, it's easier to scale out. (That's the secret sauce!)

And when I write scale-out, I don't suggest to built a cluster of 1000 servers in order to process a couple thousand documents. It's already sufficient in this case to utilize all cores in my own computer when the map task is run in parallel.

In CouchDB, the result of map is a list of keys and values.

Reduce

Reduce is called once the map-part is done. It's an optional step in terms of CouchDB — not every map requires a reduce to follow.

Real world example

  • take a simple photo application (such as flickr) with comments
  • use map to sort through the comments and emit the names of users who left one
  • use reduce to only get unique references and see how many comments were left by these user

In SQL:

SELECT user, count(*) FROM comments GROUP BY user

Why the fuzz?

Just so people don't feel offended. Map-reduce is slightly more complicated than my example SQL-query but it's also not some secret-super-duper thing. Its strength is really parallelization which requires the ability to break data into chunks to process them. The end.


Continue reading "PHP: So you'd like to migrate from MySQL to CouchDB? - Part III"

start-stop-daemon, Gearman and a little PHP

Thursday, April 22. 2010
Comments

The scope of this blog entry is to give you a quick and dirty demo for start-stop-daemon together with a short use case on Gearman (all on Ubuntu). In this example, I'm using the start-stop-daemon to handle my Gearman workers through an init.d script.

Gearman

Gearman is a queue! But unlike for example most of the backends to Zend_Queue, Gearman provides a little more than just a message queue to send — well — messages from sender to receiver. With Gearman it's trivial to register functions (tasks) on the server to make in order to start a job and to get stuff done.

For me the biggest advantages of Gearman are that it's easy to scale (add a server, start more workers) and that you can get work done in another language without building an API of some sort in between. Gearman is that API.

Back to start-stop-daemon

start-stop-daemon is a facility to start and stop programs on system start and shutdown. On recent Ubuntus most of the scripts located in /etc/init.d/ make use of it already. It provides a simple high-level API to system calls — such as stopping a process, starting it in a background, running it under a user and the glue, such as writing a pid file.

My gearman start script

Once adjusted, register it with the rc-system: update-rc.d script defaults. This will take care of the script being run during the boot process and before shutdown is completed.

A little more detail

The script may be called with /etc/init.d/script start|stop|restart (the pipes designated "or").

Upon start, we write a pidfile to /var/run and start the process. The same pidfile is used on stop — simple as that. The rest of it is hidden behind start-stop-daemon which takes care of the ugly rest for us.


Continue reading "start-stop-daemon, Gearman and a little PHP"

Caching for dummies

Tuesday, April 6. 2010
Comments

Caching is one of the things recommended whenever something is slow — "Your [database, website, server]? Well, duh! You need a cache!".

All things aside, it's not always easy to cache stuff. I find myself often in situations where I can't cache at all or where a caching solution is complex as hell to implement. All of the sudden you need to dissect your page and lazy load half of it with Ajax — bah. And that's just because those users never want to wait for a cache to expire or invalidate. They just expect the most recent ever! :-)

Sometimes however, caching can be so trivial, it even hurts.

Bypass the application server

There are lots of different techniques and strategies to employ when you start to think about caching. The fastest albeit not always available method is to bypass your app server stack completely. And here's how. :-)

An example

My example is a pet project of mine where I display screenshots of different news outlets which are taken every 3 (three) hours — 0:00, 3:00 AM, 6:00 AM, 9:00 AM, 12:00 PM, 3:00 PM, 6:00 PM, 9:00 PM and so on. In between those fixed dates, the page basically never changes and why would I need PHP to render a page if it didn't change since the last request?

Correct, I don't! :)

And here's what I did to setup my cache:

  • Apache 1.3.x (!) and mod_php5
  • my homepage: docroot/home.php
  • httpd.conf: DirectoryIndex index.html home.php
  • a cronjob: */10 * * * * fetch -q -o docroot/index.html http://example.org/home.php

In greater detail

Homepage

The home.php does all the PHP funkyness whenever I need a fresh rendered version of my website to track an error, or adjust something.

DirectoryIndex

If I ever delete my cache (index.html), my website will still be available. The DirectoryIndex will use home.php next and the request will be a little slower and also more expensive on my side, but the website will continue to work.

Cronjob

The cronjob will issue a HTTP request (GET) using fetch against my homepage and save the result to my index.html. It's really so simple, it hurts. Currently, this cronjob is executed every 10 minutes so I can fiddle with the design and deploy a change more easily, but I could run that cronjob every hour or every few hours as well.

If you don't have fetch, try the following wget command:

wget -q -O docroot/index.html http://example.org/home.php

Fin

That's all, a really simple cache which bypasses your application server. Enjoy!

If you're in for another nifty solution, I suggest you read Brandon Savage's article on caching with the Zend Framework and/or take a look at nginx+Memcached.

Defined tags for this entry: , ,