Patching The PHP Gearman Extension

Apollo 10 Capsule

Update: it seems that this behaviour in libgearman changed from 0.8 to 0.10, and according to Eric Day (eday), the behaviour will change back to the old one with 0.11.

After upgrading to the most recent version of the Gearman-extension from PHP and libgearman, Suhosin started complaining about a heap overwrite problem. The error only popped up on certain response sizes, which made me guess that it could be a buffer overrun or something strange going on in the code handling the response.

Seeing this as an excellent opportunity to get more familiar with the Gearman code, I dug into the whole shebang yesterday and continued my quest for cleansing today. After quite a few hours of getting to know the code and attempting to understand the general flow, I was finally able to find – and fix – the problem.

The first symptom of the issue was that the Gearman extension at certain times failed to return the complete response from the gearman server. I created a small application that returned responses of different sizes, showing that the problem was all over the place. While n worked, n+1 returned only n bytes, and n+2 resulted in a heap overflow.

The issue was caused by an invalid efree, where the code in question was:

void _php_task_free(gearman_task_st *task, void *context) {
	gearman_task_obj *obj= (gearman_task_obj *)context;
    TSRMLS_FETCH();

	if (obj->flags & GEARMAN_TASK_OBJ_DEAD) {
		GEARMAN_ZVAL_DONE(obj->zdata)
		GEARMAN_ZVAL_DONE(obj->zworkload)
		efree(obj);
	}
	else 
	  obj->flags&= ~GEARMAN_TASK_OBJ_CREATED;
}

This seems innocent enough, and I really had trouble seeing how this could lead to the observed behaviour. This meant going for a wild goose chase around the Gearman code, trying to piece together how things worked. And after a few proper debug rounds, I finally discovered the issue: the context variable was not a gearman_task_obj struct under certain criteria. The gearman_task_obj struct is allocated by php_gearman and then assigned to the task in question. This makes it possible for the extension to tag an internal structure together with the task in libgearman. Under certain conditions this struct is not created, and by default, libgearman assigns the client struct to the context instead (this is also available as task->client). So instead of the gearman_task_obj that was assumed to be present, we actually got a gearman_client struct.

That provides a reason why things went sour, but why exactly did I see the behaviour I saw? Well, to answer that, we’ll have to take a look at the actual contents of the struct. The client struct contains a value keeping the number of bytes in the response, while the task_obj struct keeps the flags (which is what the code above checks and updates). Coincidentally these two int values are aligned similiar in the two structs – resulting in the number of bytes in the response being used as the flags value. This value is then modified (under certain conditions) or results in a free using other offsets into the struct. The call to efree() will then use some random values (or, more specific, the values that lines up with the location in task_obj) when attempting to do the free, resulting in a corruption. Suhosin caught it, while it would probably have generated a few weird bugs (where the last byte would go missing) under an unprotected PHP installation. +1 for Suhosin!

The patch for php_gearman.c is available, and should be applied towards 0.6.0. Although I’ve had a few looks around, it might introduce a memory leak. People who know the code way better than I do will probably commit a better patch, and the issue will be fixed in 0.7.0 of the extension.

Google Releases Their Protocol Buffers

Fresh from the Google Open Source Blog comes news that Google has released their Protocol Buffers specification and accompanying libraries. The code and specification has been release at Protocol Buffers on Google Code.

Protocol Buffers is a data format for fast exchange and parsing of data and messages between computers. It is similar to simple uses of XML in this manner, but the messages size on the wire and their parsing time is very much optimized for busy sites. There is no need to spend loads of time doing XML parsing when you instead could do something useful. It’s very easy to interact with the messages through the generated classes (for C++, Java and Python), and future versions of the same schema are compatible with old versions (as new fields are just ignored by older parsers).

Still no PHP implementation available, so guess it’s time to get going and lay down some code during the summer. Anyone up for the job?