Adding Support for Asynchronous Status Requests for Net_Gearman

I spent the evening yesterday playing around a bit more with Gearman, a system for farming out tasks to workers across several servers. As my workstation at home still runs Windows, the only PHP library available is the Net_Gearman in PEAR. Net_Gearman supports tasks (something to do), sets (a collection of tasks), workers (the processes that performs the task) and clients (which requests tasks to be performed). The gearman protocol supports retrieving the current status of a task from the gearman server (which contains information about how the worker is progressing, reported by the worker itself), but Net_Gearman did not.

The reason for ‘did not’ is that I’ve created a small patchset to add the functionality to Net_Gearman. All internal methods and properties are still used as they were before, but I’ve added two helper methods for retrieving the socket connection for a particular gearman server (Net_Gearman usually just picks a random server, but we need to contact the server that’s responsible for the task) and a getStatus(server, handle) method to the Gearman Client. I’ve also added a property keeping the address of the server which were assigned the task to the Task class.

After submitting a task to be performed in the background (you do not need this to get the status for foreground tasks, as you can provide a callback to handle that), your Task object will have its handle and server properties set. These can be used to retrieve status information about the task later. You’ll still need to provide the possible servers to the Gearman client when creating the client (through the constructor).

Example of creating a task and retrieving the server / handle pair after starting the task:

require_once 'Net/Gearman/Client.php';

$client = new Net_Gearman_Client(array('host:4730'));

$task = new Net_Gearman_Task('Reverse', range(1,5));
$task->type = Net_Gearman_Task::JOB_BACKGROUND;

$set = new Net_Gearman_Set();
$set->addTask($task);

$client->runSet($set);

print("Status information: \n");
print($task->handle . "\n");
print($task->server . "\n");

Retrieving the status:

require_once 'Net/Gearman/Client.php';

$client = new Net_Gearman_Client(array('host:4730'));
$status = $client->getStatus('host:4730', 'H:mats-ubuntu:1');

The array returned from the getStatus() method is the same array as returned from the gearman server and contains information about the current status (numerator, denominator, finished, etc, var_dump it to get the current structure). I’ve also added the patchset to the Issue tracker for Net_Gearman at github.

The patchset (created from the current master branch at github) can be downloaded here: GearmanGetStatusSupport.tar.gz.

UPDATE: I’ve finally gotten around to creating my own fork of NET_Gearman on github too. This fork features the patch mentioned above.

Adding SVN Revision to a Configuration File

After a while you realize that the best way to serve almost-never-changing content is to give the content an expire date way ahead in the future. The allows your server and your network pipes to do more sensible stuff than delivering the same old versions of files again and again and again and again.

A problem does however surface when you want to update the files and make the visiting user request the new version instead of the old. The trick here is to change the URL for the resource, so that the browser requests the new file. You can do this by appending a version number to the file and either rewriting it behind the scenes to the original file, or by appending a timestamp (or some other item) to the URL as a GET value. The web server ignores this for regular files, but as it identifies a new unique resource, the web browser has to request it again and use the new and improved ™ file.

Using the timestamp of the file is a bit cumbersome and requires you to hit the disk one additional time each time you’re going to show an URL to one of the almost-static resources, but luckily we already have an identifier describing which version the file is in: the SVN revision number (.. if you use subversion, that is). You could use the SVN revision for each file by itself, but we usually decide that the global version number for SVN is good enough. This means that each time you update the live code base through svn up or something like that (remember to block .svn directories and their files if you run your production directory from a SVN branch. This can be discussed over and over, but I’m growing more and more fond of actually doing just that..). To avoid having to call svnversion each time, it’s useful to be able to insert the current revision number into the configuration file for the application (or a header file / bootstrap file).

Here’s an example of how you can insert the current SVN revision into a config file for a PHP application.

  1. Create a backup of the current configuration file.
  2. Update the current revision through svn up.
  3. Retrieve the current revision number from svnversion.
  4. Insert the revision number using sed into a temporary copy of the configuration file.
  5. Move the new configuration file into place as the current configuration file.
  6. Party like it’s 1999!

This assumes that you use an array named $config in your configuration file. I suggest that you name it something else, but for simplicity I’m going with that here. First, create a $config[‘svn’] entry in your config file. If you have some other naming scheme, you’re going to have to change the relevant parts below.

#!/bin/bash
cp ./config/config.php ./config/config.backup.php
svn up
VERSION=`svnversion .`
echo $VERSION
sed "s/config\['svn'\] = '[0-9M]*';/config\['svn'\] = '$VERSION';/" < ./config/config.php > ./config/config.fixed.php
mv ./config/config.fixed.php ./config/config.php

Save this into a file named upgrade.sh, make it executable by doing chmod u+x upgrade.sh and run it by typing ./upgrade.sh.

And this is where you put your hands above your head and wave them about. When you’re done with that, you can refer to your current SVN revision using $config[‘svn’] in your PHP application (preferrably in your template or where you build the URLs to your static resources). Simply append ?v=$config[‘svn’] to your current filenames. When you have a new version available, run ./upgrade.sh (or whatever name you gave the script) again and let your users enjoy the new experience.

Randy Shoup on Lessons of Scaling eBay

InfoQ has an article about scaling ebay online written by Randy Shoup about some of the lessons they’ve learned over at eBay during the years when it comes to scaling one of the largest internet sites in the world. The article is a very interesting read, and I’ll sum up the seven main points:

  1. Partition by Function
  2. Split Horizontally
  3. Avoid Distributed Transactions
  4. Decouple Functions Asynchronously
  5. Move Processing To Asynchronous Flows
  6. Virtualize At All Levels
  7. Cache Appropriately