November 30th, 2009
It’s been a couple of weeks since I last had to go hex hunting, but fear not – we’re back in action! We discovered this issue after adding support for uploading ZIP files in the platform powering Gamer.no. All our test cases worked perfectly, but as soon as we released the feature our users discovered zip files from one particular site never worked. WinRAR and 7zip were both able to extract the files, so we just left the issue at that for the time being.
This evening I finally got around to trying to find out exactly what was going on. I started by updating my SVN trunk checkout of PHP6, adding the zip module to my configure and running make clean and make. The issue was still present in HEAD, so that meant getting my hands dirty and trying to find out exactly what we were up against.
The first thing on the table was to find out what the zip module was getting all antsy about when trying to open the file. A few well placed printfs (all hail the magic of printf debugging) told me that the line that this particular ZIP file did not pass was:
-
if ((comlen < cd->comment_len) || (cd->nentry != i)) {
The cd struct is the “end of central directory” struct, containing meta data about the archive and an optional comment about the ZIP file. The comlen and the cd->comment_len were both zero (as the file didn’t have a comment), so the trouble maker was the cd->nentry != i statement. The library that the PHP module uses reads i by itself and then reads the struct. nentry is the number of files in “this disk” of the archive (a ZIP archive can be divided across several physical disks (think back to 8″, 5.25″ and 3.5″ disks)), while i is the “total number of files in the archive”. The PHP library only supports single archives (not archives spanning several disks), so these two values should be identical. For some reason, they weren’t – meaning that the zip files generate by this particular site actually are invalid ZIP-files. WinRAR and 7zip just makes the best of the situation and does it very nicely.
Here’s the hex dump of the end of central directory section from one of the non-working files:

The first four bytes are the signature of the section (50 4B 05 06, or PK\5\6), then we have 2 bytes which is the “number of this disk” (00 00 here) and 2 bytes with “number of the disk with the start of the central directory” (00 00 here again) (the library in PHP doesn’t support archives spanning multiple disks, so it just compares this section to 00 00 00 00). Then we have our magic numbers, both two bytes: “total number of entries in the central directory on this disk” and “total number of entries in the central directory”. These should logically be the same for single file archives, but they’re 6 and 3 (06 00 and 03 00) here (the correct number of files in the archive is 3).
The solution for us is to use the correct number (“total number of entries in the central directory”) for both the values. To do this we simply patch the two bytes of the binary zip file (.. we could do this with fopen, fseek, fwrite in place, but we’re lazy. This is not time sensitive code.) and rewrite it in place:
-
/**
-
* Attempt to repair the zip file by correcting wrong total number of files in archive
-
*/
-
static public function repairZIPFile($file)
-
{
-
// lets read in the file
-
$data = file_get_contents($file);
-
-
// lets try to find the end of the central directory record (should be
-
// the last thing in the file, so we search from the end)
-
$startpos = strrpos($data, "PK\5\6");
-
-
// if we found a header..
-
if ($startpos)
-
{
-
// attempt to repair the file by copying the "total files" to the "files on this disk" field.
-
// PHP's ZIP module doesn't handle multidisks anyway..
-
$data[$startpos+8] = $data[$startpos+10];
-
$data[$startpos+9] = $data[$startpos+11];
-
-
file_put_contents($file, $data);
-
}
-
}
And Voilá – we’re a happy bunch again!
Tags: hacking, patching, PHP, php_zip, zip, ziparchive
Posted in Hacks, PHP, Programming | No Comments »
October 26th, 2009
If you’re doing several encodes of a single input file (to encode several different sizes / bitrate combinations) in parallel with x264, you’re going to have a problem. The first pass will create three files with information about the file for the second pass, and you’re unable to change this file name into something better. This seems to be a problem for quite a lot of people according to a Google-search for the issue, and none seems to have any proper solution.
I have one. Well, probably not a proper solution, but at least it works! The trick is to realize that ffmpeg/x264 creates these files in the current working directory. To run several encodings in parallel, you’ll simply have to give each encoding process it’s own directory, and then use absolute paths to the source and destination file (and any other paths). Let it create the files there and clean up and delete the directories afterwards.
I’ve included some example code from PHP in regards to how you could solve something like this. I simply use the output file name as the directory name here, and create the directory in the system temp directory.
-
$tempDir = sys_get_temp_dir() . '/' . $outputFilename);
-
mkdir($tempDir, 0700, true);
-
chdir($tempDir);
After doing the encode, we’ll have to clean up. The three files that ffmpeg/x264 creates are ffmpeg2pass-0.log, x264_2pass.log and x264_2pass.log.mbtree.
-
unlink($tempDir . '/ffmpeg2pass-0.log');
-
unlink($tempDir . '/x264_2pass.log');
-
unlink($tempDir . '/x264_2pass.log.mbtree');
-
rmdir($tempDir);
And that should hopefully solve it!
Tags: 2pass, encoding, ffmpeg, PHP, x264
Posted in PHP, Scalability | No Comments »
October 24th, 2009

Update: it seems that this behaviour in libgearman changed from 0.8 to 0.10, and according to Eric Day (eday), the behaviour will change back to the old one with 0.11.
After upgrading to the most recent version of the Gearman-extension from PHP and libgearman, Suhosin started complaining about a heap overwrite problem. The error only popped up on certain response sizes, which made me guess that it could be a buffer overrun or something strange going on in the code handling the response.
Seeing this as an excellent opportunity to get more familiar with the Gearman code, I dug into the whole shebang yesterday and continued my quest for cleansing today. After quite a few hours of getting to know the code and attempting to understand the general flow, I was finally able to find – and fix – the problem.
The first symptom of the issue was that the Gearman extension at certain times failed to return the complete response from the gearman server. I created a small application that returned responses of different sizes, showing that the problem was all over the place. While n worked, n+1 returned only n bytes, and n+2 resulted in a heap overflow.
The issue was caused by an invalid efree, where the code in question was:
-
void _php_task_free(gearman_task_st *task, void *context) {
-
gearman_task_obj *obj= (gearman_task_obj *)context;
-
TSRMLS_FETCH();
-
-
if (obj->flags & GEARMAN_TASK_OBJ_DEAD) {
-
GEARMAN_ZVAL_DONE(obj->zdata)
-
GEARMAN_ZVAL_DONE(obj->zworkload)
-
efree(obj);
-
}
-
else
-
obj->flags&= ~GEARMAN_TASK_OBJ_CREATED;
-
}
This seems innocent enough, and I really had trouble seeing how this could lead to the observed behaviour. This meant going for a wild goose chase around the Gearman code, trying to piece together how things worked. And after a few proper debug rounds, I finally discovered the issue: the context variable was not a gearman_task_obj struct under certain criteria. The gearman_task_obj struct is allocated by php_gearman and then assigned to the task in question. This makes it possible for the extension to tag an internal structure together with the task in libgearman. Under certain conditions this struct is not created, and by default, libgearman assigns the client struct to the context instead (this is also available as task->client). So instead of the gearman_task_obj that was assumed to be present, we actually got a gearman_client struct.
That provides a reason why things went sour, but why exactly did I see the behaviour I saw? Well, to answer that, we’ll have to take a look at the actual contents of the struct. The client struct contains a value keeping the number of bytes in the response, while the task_obj struct keeps the flags (which is what the code above checks and updates). Coincidentally these two int values are aligned similiar in the two structs – resulting in the number of bytes in the response being used as the flags value. This value is then modified (under certain conditions) or results in a free using other offsets into the struct. The call to efree() will then use some random values (or, more specific, the values that lines up with the location in task_obj) when attempting to do the free, resulting in a corruption. Suhosin caught it, while it would probably have generated a few weird bugs (where the last byte would go missing) under an unprotected PHP installation. +1 for Suhosin!
The patch for php_gearman.c is available, and should be applied towards 0.6.0. Although I’ve had a few looks around, it might introduce a memory leak. People who know the code way better than I do will probably commit a better patch, and the issue will be fixed in 0.7.0 of the extension.
Tags: c++, Gearman, gearman-dev, libgearman, PHP, php-dev
Posted in Hacks, PHP, Programming | No Comments »
September 15th, 2009
This is one of the common error messages that seems to appear after installing PHP – in particular under Ubuntu or Debian (where I experienced it). The reason for this is that the PDO version you’ve just installed is too old for the constant to exist, usually because you where naive enough to install the extension from PECL instead of using the default supplied by ubuntu. If you did the same as me:
apt-get install
pecl install pdo
pecl install pdo_mysql
/etc/init.d/apache2 restart
And everything seems to work, except for that missing constant. What the fsck?!
The reason is that the PECL version of PDO is no longer maintained (I’d suggest to automagically push the newest version to PECL too, just so pecl install and pecl upgrade works as expected). The fact is that when you did pecl install pdo, you destroyed the more recent version provided by the default php5-common package (under Ubuntu, at least).
To solve the problem (and if you also did pecl install pdo_mysql, you’ll have to replace that one too…)
apt-get install --reinstall php5-common php5-mysql php5-mysqli
Restart. Relive. Re.. eh .. yeah. Rerere.
If you’re building from source, you’ll need to add:
--enable-pdo --with-pdo-mysql --with-pdo-pgsql (and any other driver you need)
Tags: debian, MySQL, pdo, pdo_mysql, PHP, Ubuntu
Posted in MySQL, PHP | No Comments »
August 26th, 2009
First, a small introduction to the problem: We’re running stuff in UTF-8 all the way. A few sites we’re reading feeds from are using ISO-8859-1 as their charset, but they either supply the feed with the correct encoding specific or the feeds arrive as UTF-8. Everything works nicely, except for the mentioned-in-the-headline en-dashes. Firefox only shows 00 96 (0×00 0×96), but everything looks correct when you view the headlines and similiar stuff on the original site.
Strange.
The digging, oh all the digging.
After the already mentioned digging (yes, the digging) in data at the large search engines (ok, maybe I did a search or two), I discovered that the windows cp1252 encoding uses 0×96 to store endashes. This seems similiar! We’re seeing 0×96 as one of the byte values above, so apparently cp1252 is sneaking into the mix somewhere along the lines. Most of the clients using the CMS-es are windows, so they might apparently be to blame.
ISO-8859-1 enters the scene
As the sites (and feeds) provide ISO-8859-1 as their encoding, I thought it would be interesting to see what ISO-8859-1 defines as the representation for the byte value 0×96. Lo’ and behold: 0×96 is not defined in ISO-8859-1. Which actually provides us with the solution.
I welcome thee, Mr. Solution
When the ISO-8859-1 encoded string is converted into UTF-8, the bytes with the value 0×96 (which is the endash in cp1252) is simply inserted as a valid code sequence in UTF-8 which represents a character that’s not defined.
We’re saying that the string is ISO-8859-1, although in reality it is either cp1252 or a mangled version of iso-8859-1 and cp1252 (for the endashes, at least).
If you’re on the parsing end of this mumbo jumbo, one solution is to replace the generated UTF-8 sequence (0xc2 0×96) (converted from 0×96 i ISO-8859-1) with the proper one (0xe2 0×80 0×93):
-
$data = str_replace("\xc2\x96", "\xE2\x80\x93", $data);
And voilá, everything works.
Tags: converting, cp1252, encoding, iso-8859-1, parsing, PHP, Programming, rss, utf-8, xml
Posted in PHP, Programming | No Comments »
August 23rd, 2009
Every once in a while a few new interesting tools surface themselves and become a natural part of how a developer works. I’ve taken a look at which tools I’ve introduced in my regular workflow during the last six months.
NetBeans
NetBeans got the first version of what has become awesome PHP support in version 6.5, and after version 6.7 got released just before the summer, things have become very stable. NetBeans is absolutely worth looking into for PHP development (and Java), and you sure can’t beat the price (free!). In the good old days NetBeans were slow as hell, but I’ve not noticed any serious issues in 6.7 (.. although we didn’t really have quad cores and 4GB of memory back then either). Go try it out today!
Balsamiq Mockups
Balsamiq is an awesome tool for making quick mockups for UI designs. Previous I’d play around in Adobe Photoshop, dragging layers around and being concerned with all the wrong things. Mockups abstracts away all the UI elements (and comes with a vast library of standard elements), which makes it very easy to experiment and focus on the usability instead of the design and its implementation. For someone who’s more interested in the experience and the programming than the actual design (.. I’ll know what I want when I see it!) this makes it easy to convey my suggestions and create small, visual notes of my own usabilityideas.
You can try it out for free at their website, and they even give away licenses to people who are active in Open Source Development (disclaimer: I got a free license, but the experiences are all my own. This is not paid (or unpaid) advertising or product placement.)
GitHub
I’ve been playing around with git a bit, but after writing a patch for the PEAR-module for Gearman (.. which still doesn’t seem to have made it anywhere significant), I signed up for github to be able to fork the project and submit my patch there. A very good technical solution partnered with an easy way of notifying the original developers of your patch (which you simply provide in your own branch) by submitting a “pull request” makes it very easy to both have patches supplied to you and to submit patches to projects hosted at GitHub.
Thumbs up!
Tags: balsamiq, Gearman, github, netbeans, PHP
Posted in PHP, Programming, applications | No Comments »
August 13th, 2009
There’s one thing SimpleXML for PHP is horrible to use for: parsing XML containing namespaces. Namespaces requires special handling, and the only way I’ve found that allows you to refer to an element in another namespace, is to use the ->children() method with the namespace. I’m sure there’s an easier way than this, and if you know of any, please leave a comment!
Let’s start with the following XML snippet (using SOAP as an example):
-
<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope">
-
<soap:Body>
-
<queryInstantStreamResponse>
-
asdasd
-
</queryInstantStreamResponse>
-
</soap:Body>
-
</soap:Envelope>
The easiest way to do this is to “ignore” the namespaces, and simply do $root->{soap:Envelope} that to access the property. This will not work, as SimpleXML is quite peculiar about it’s namespaces (.. while everything else is simple and easy to use).
One solution is to provide the namespace you’re interested in to the $element->children() method, which returns all the children of the element in a particular namespace (or without arguments, outside any namespace):
-
$sxml = new SimpleXMLElement(file_get_contents('soap.xml'));
-
-
foreach($sxml->children('http://www.w3.org/2001/12/soap-envelope') as $el)
-
{
-
if ($el->getName() == 'Body')
-
{
-
/* … */
-
}
-
}
Yes. That’s quite horrible.
But luckily the xpath method can help us:
-
$elements = $sxml->xpath('//soap:Envelope/soap:Body/queryInstantStreamResponse'));
This will actually fetch all the elements titled “queryInstantStreamResponse” which are childs of soap:Envelope and soap:Body. And this works as you expect it to, without having to use children, provide the actual namespace URI, etc.
The xpath method returns an array containing all the matching elements, so in this case you’ll receive an array with a single element, containing the text inside the queryInstantStreamResponse element.
There should be an easier way than this.
Tags: namespaces, PHP, simplexml, xpath
Posted in PHP, Programming | No Comments »
August 9th, 2009
After upgrading to PHP 5.3.0 on my Windows XP Workstation, Junctions have suddenly stopped working in any PHP related code. I use junctions to hardlink directories from their version specific paths (NTFS symlinks where first introduced with Vista, so I’m still using Junctions), but after upgrading none of the libraries which live in directories that are linked through junctions work.
This seems to be a known bug, Files on NTFS Mounted Volumes (Junctions) inaccessible, although I’m also seeing the issue with completly local files (and not mounted from remote file systems). Seems like the thing to do is to wait for 5.3.1 to resolve the issue .. if it gets fixed to that. For the time being I’ll manually copy the directories.
Update: I’ve added a log of a test session showing the problem.
Tags: bugs, junctions, ntfs, PHP, php5.3
Posted in PHP | No Comments »
May 29th, 2009
After a while you realize that the best way to serve almost-never-changing content is to give the content an expire date way ahead in the future. The allows your server and your network pipes to do more sensible stuff than delivering the same old versions of files again and again and again and again.
A problem does however surface when you want to update the files and make the visiting user request the new version instead of the old. The trick here is to change the URL for the resource, so that the browser requests the new file. You can do this by appending a version number to the file and either rewriting it behind the scenes to the original file, or by appending a timestamp (or some other item) to the URL as a GET value. The web server ignores this for regular files, but as it identifies a new unique resource, the web browser has to request it again and use the new and improved ™ file.
Using the timestamp of the file is a bit cumbersome and requires you to hit the disk one additional time each time you’re going to show an URL to one of the almost-static resources, but luckily we already have an identifier describing which version the file is in: the SVN revision number (.. if you use subversion, that is). You could use the SVN revision for each file by itself, but we usually decide that the global version number for SVN is good enough. This means that each time you update the live code base through svn up or something like that (remember to block .svn directories and their files if you run your production directory from a SVN branch. This can be discussed over and over, but I’m growing more and more fond of actually doing just that..). To avoid having to call svnversion each time, it’s useful to be able to insert the current revision number into the configuration file for the application (or a header file / bootstrap file).
Here’s an example of how you can insert the current SVN revision into a config file for a PHP application.
- Create a backup of the current configuration file.
- Update the current revision through svn up.
- Retrieve the current revision number from svnversion.
- Insert the revision number using sed into a temporary copy of the configuration file.
- Move the new configuration file into place as the current configuration file.
- Party like it’s 1999!
This assumes that you use an array named $config in your configuration file. I suggest that you name it something else, but for simplicity I’m going with that here. First, create a $config['svn'] entry in your config file. If you have some other naming scheme, you’re going to have to change the relevant parts below.
-
#!/bin/bash
-
cp ./config/config.php ./config/config.backup.php
-
svn up
-
VERSION=`svnversion .`
-
echo $VERSION
-
sed "s/config\['svn'\] = '[0-9M]*';/config\['svn'\] = '$VERSION';/" < ./config/config.php > ./config/config.fixed.php
-
mv ./config/config.fixed.php ./config/config.php
Save this into a file named upgrade.sh, make it executable by doing chmod u+x upgrade.sh and run it by typing ./upgrade.sh.
And this is where you put your hands above your head and wave them about. When you’re done with that, you can refer to your current SVN revision using $config['svn'] in your PHP application (preferrably in your template or where you build the URLs to your static resources). Simply append ?v=$config['svn'] to your current filenames. When you have a new version available, run ./upgrade.sh (or whatever name you gave the script) again and let your users enjoy the new experience.
Tags: configuration, PHP, revision, sed, subversion, svn, version number
Posted in Hacks, PHP, Programming, Scalability, javascript | 1 Comment »
May 2nd, 2009
While importing a rather large XML-document (45MB+) into a database today, I ran into a weird problem on one specific server. The server runs SUSE Enterprise and presented an error that neither other test server gave. After a bit of digging around on the web I were able to collect enough information from several different threads about what could be the source of the problem.
It seems that the limit was introduced in libxml2 about half a year ago to avoid some other memory problems, but this apparently borked quite a few legitimate uses. As I have very little experience with administrating SUSE Enterprise based servers, I quickly shrugged off trying to update the packages and possibly recompiling PHP. Luckily one of the comments in a thread about the problem saved the day.
If you find yourself running into this message; swap your named entities in the XML file (such as < and >) to numeric entities (such as < and >). This way libxml2 just replaces everything with the byte value while parsing instead of trying to be smart and keep an updated cache.
Tags: libxml, libxml2, parsing, PHP, Programming, redhat, rhn, xml
Posted in PHP | 2 Comments »