Adding SVN Revision to a Configuration File

After a while you realize that the best way to serve almost-never-changing content is to give the content an expire date way ahead in the future. The allows your server and your network pipes to do more sensible stuff than delivering the same old versions of files again and again and again and again.

A problem does however surface when you want to update the files and make the visiting user request the new version instead of the old. The trick here is to change the URL for the resource, so that the browser requests the new file. You can do this by appending a version number to the file and either rewriting it behind the scenes to the original file, or by appending a timestamp (or some other item) to the URL as a GET value. The web server ignores this for regular files, but as it identifies a new unique resource, the web browser has to request it again and use the new and improved ™ file.

Using the timestamp of the file is a bit cumbersome and requires you to hit the disk one additional time each time you’re going to show an URL to one of the almost-static resources, but luckily we already have an identifier describing which version the file is in: the SVN revision number (.. if you use subversion, that is). You could use the SVN revision for each file by itself, but we usually decide that the global version number for SVN is good enough. This means that each time you update the live code base through svn up or something like that (remember to block .svn directories and their files if you run your production directory from a SVN branch. This can be discussed over and over, but I’m growing more and more fond of actually doing just that..). To avoid having to call svnversion each time, it’s useful to be able to insert the current revision number into the configuration file for the application (or a header file / bootstrap file).

Here’s an example of how you can insert the current SVN revision into a config file for a PHP application.

  1. Create a backup of the current configuration file.
  2. Update the current revision through svn up.
  3. Retrieve the current revision number from svnversion.
  4. Insert the revision number using sed into a temporary copy of the configuration file.
  5. Move the new configuration file into place as the current configuration file.
  6. Party like it’s 1999!

This assumes that you use an array named $config in your configuration file. I suggest that you name it something else, but for simplicity I’m going with that here. First, create a $config[‘svn’] entry in your config file. If you have some other naming scheme, you’re going to have to change the relevant parts below.

#!/bin/bash
cp ./config/config.php ./config/config.backup.php
svn up
VERSION=`svnversion .`
echo $VERSION
sed "s/config\['svn'\] = '[0-9M]*';/config\['svn'\] = '$VERSION';/" < ./config/config.php > ./config/config.fixed.php
mv ./config/config.fixed.php ./config/config.php

Save this into a file named upgrade.sh, make it executable by doing chmod u+x upgrade.sh and run it by typing ./upgrade.sh.

And this is where you put your hands above your head and wave them about. When you’re done with that, you can refer to your current SVN revision using $config[‘svn’] in your PHP application (preferrably in your template or where you build the URLs to your static resources). Simply append ?v=$config[‘svn’] to your current filenames. When you have a new version available, run ./upgrade.sh (or whatever name you gave the script) again and let your users enjoy the new experience.

parser error : Detected an entity reference loop

While importing a rather large XML-document (45MB+) into a database today, I ran into a weird problem on one specific server. The server runs SUSE Enterprise and presented an error that neither other test server gave. After a bit of digging around on the web I were able to collect enough information from several different threads about what could be the source of the problem.

It seems that the limit was introduced in libxml2 about half a year ago to avoid some other memory problems, but this apparently borked quite a few legitimate uses. As I have very little experience with administrating SUSE Enterprise based servers, I quickly shrugged off trying to update the packages and possibly recompiling PHP. Luckily one of the comments in a thread about the problem saved the day.

If you find yourself running into this message; swap your named entities in the XML file (such as &lt; and &gt;) to numeric entities (such as &#60; and &#62;). This way libxml2 just replaces everything with the byte value while parsing instead of trying to be smart and keep an updated cache.

Borked Behaviour for the Back-button in Firefox

I investigated a strange problem yesterday, where the back button in Firefox returned the user to the top of the previous page, instead of to the location where he already had scrolled. The problem seemed to have brought its fair share of problems for developers all over, and a thread detailing the problem in Drupal provided the information needed to solve it. The problem is actually so wide-spread that there is a dedicated Firefox extension to solve the issue (Restore Scroll Position).

Anyways, the issue stems from the Cache-Control headers that PHP among others include by default:


Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0

The problem is that the “no-store” directive tells the browser do NOT store a version of the page anywhere, not temporarily, not .. ever. Internet Explorer and Opera still remembers the position, but Firefox decided to take everything a step further and does not keep any information available. The extension mentioned above saves the scroll position in another location and then restores the scroll position after navigating back to the page.

The problem is solved by changing the Cache-Control header:


Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0

A very helpful tip here is that you probably need to restart Firefox to make it respect the new header, as it will keep its old behavior until you restart the browser (at least for Firefox 3.0.6).

PDO and PDO::PARAM_INT

Hi there Mr. PDO!

We’ve come to know each other, and yes, while you have your troubles (.. which I don’t, of course), I’ve accepted your short comings. Today you threw another one of your fits, but I’ll be sure to document it for the world to see.

$statement = $pdo->prepare("
    ...
    LIMIT
        :offset, :hits
");

Yep. This will of course fail if you’re binding strings. ’10’, ’10’ is not very helpful now, is it. Good point. So let’s tell PDO that we’re really binding ints:

$statement->bindValue(':offset', $offset, PDO::PARAM_INT);
$statement->bindValue(':hits', $hits, PDO::PARAM_INT);

But wait. You’re still complaining?! I told you they were ints?! What’s the problem now?!?!

Well. Mr. PDO requires you to also convert the values for him. So first you have to convert the values of a loosely typed language to a strong type, then you have to tell the library that yes, this is in fact another type than what the library obviously assumes that it is. This works:

$statement->bindValue(':offset', (int) $offset, PDO::PARAM_INT);
$statement->bindValue(':hits', (int) $hits, PDO::PARAM_INT);

Which means the following:

If the type of your variable internally is a string, it’ll be escaped as a string, even if you tell PDO that it should be handled as an INT in your database layer.

If the type of your variable is an int, it’ll be handled as a string, unless you tell PDO it is an int.

Something is backwards here.

A Plugin for Paginating in Smarty

First I’d like to apologize for the lack of updates here in the last weeks, but the days have been very busy. I’ve bought a new car (more details about that as soon as the snow disappears), written a complete publishing platform from scratch in a weekend to help out when Gamer.no got in trouble and in general done a load of stuff. Anyways, this post isn’t about all that, but rather something else I wrote some time ago.

A use case you’ll encounter very often is the act of paginating items, i.e. including a simple “jump to page x, jump to the next page, jump to the previous page” footer. If you’ve ever tried to implement the logic around this in your view, you know that it can get quite extensive. You have several other solutions, such as the PEAR_Pager, which actually looks like a good solution now (with 2.x). Anyways, this is a plugin for Smarty to make generating pagination links easier.

Download the plugin, drop it into your plugins/ folder in your Smarty library directory, and voilá, you have access to the new {paginator} element.

The module is quite configurable, but as I’ve only extended the parts I’ve had use for it our projects, it may still lack a few simple keys.

{paginator hits=$hits offset=$offset total_hits=$articlesFound class_inactive=paginatorInactive class_active=paginatorActive}

The template variables used here are hits, the number of hits shown on this page, offset, the offset from 0 and total_hits, the total number of available hits in the current list. By default the plugin appends ?hits=<hits>&offset=<offset>+<hits> to the current URL, you can give another URL through the url attribute. The class_ attributes provide the CSS classes to use for the elements that enclose the page numbers or links. See the source code (!) for more information about attributes which work.

As I mentioned previously, the plugin is written for my own personal use, so it’s not as streamlined as it could be. Feel free to update it, dissect it, break it, claim it’s yours .. or anything. I’d be happy if you submit any patches to me so I can update the link here, or simply leave a comment.

Hack aways! You can see the plugin in action at the bottom of lovethatfun.com.

Undefined symbol: php_pdo_declare_long_constant

After installing a new PDO module (PDO_PgSQL) into our compiled-from-the-ground-up version of PHP 5.2.8 (.. since RHEL4 doesn’t really stay updated, but we do), i ran head first into the following issue:


/usr/sbin/httpd: symbol lookup error: /usr/local/lib/php/extensions/no-debug-non-zts-20060613/pdo_pgsql.so: undefined symbol: php_pdo_declare_long_constant

Panic. Then tried updating pdo_mysql which actually still worked, which just led it to have the exact same problem. Luckily a bit of searching at Google pointed me to PDO_MYSQL causing Apache segfault over at the PECL bug tracker. The last comment provided the solution to the problem: a quick rebuild of PHP with –disable-pdo and then enabling pdo from PECL instead (so that PDO and the PDO plugins API actually match, instead of trying to load the wrong version into the process) solved the issue.

Be sure to build PDO from the SAME VERSION as your client libraries. Disable it in the PHP build itself if you need to build it from PECL.

The Side Effect of Using Return Type To Handle Errors

I came across a curious little side effect of the issue I just posted earlier:

public function action()
{
   $id = filter_input(INPUT_POST, 'id', FILTER_SANITIZE_STRING);
	
   if(is_bool($id))
   {
      die('invalid id');
   }

   if(isset($_POST['accept']))
   {
      $this->acceptId($id);
   }
   else if(isset($_POST['reject']))
   {
      $this->rejectId($id);
   }
}

This is obviously going to do something when a POST has occured, but the method was invoked each and every time. The reason why it works? If there is no variable named ‘id’ POSTed, filter_input returns null. And null is not boolean, so the test passes. But as it’s not a POST request, neither of the two other if-tests are true, so the code silently passes through (id can contain characters here, so the filter isn’t used to just get integers).

If you’re not going to throw an error when the parameter is missing, the test is actually completely useless. This bug had hidden itself within the usage of the type to test for fault instead of actually checking the value, and did not creep out before I rewrote the if-test.

Communicating The Right Thing Through Code

While trying to fix a larger bug in a module I never had touched before, I came across the following code. While technically correct (it does what it’s supposed to do, and there is no way to currently get it to do something wrong (.. an update on just that), does have a serious flaw:

$result = get_entry($id);
if(is_bool($result))
{
    die('bailing out');
}

Hopefully you can see the error in what the code communicates; namely that the return type from the function is used to what should be considered an error.

While this works as the only way the function can return a boolean value is if it returns false, the person reading the code at a later date will wonder what the code is supposed to do – he or she might not have any knowledge about how the method works. Maybe the method just sets up some resource, a global variable (.. no, don’t do that. DON’T.), etc, but the code does not communicate what we really expect.

As PHP is dynamically typed, checking for type before comparing is perfectly OK, as long as you’re not counting “no returned elements” as an error. The following code more clearly communicates it intent (=== in PHP is comparison based on both type and content, which means that both the type of the variable and the content of it have to match. 0 === “0” will be considered false.):

$result = get_entry($id);
if($result === false)
{
    die('bailing out');
}

Or if you’re interested in getting to know if the element returned is actually considered false (such as an empty array, an empty string, etc), just drop one of the equal signs:

$result = get_entry($id);
if($result == false)
{
    die('bailing out');
}

I’m also not fond of using die() as a method for stopping a faulty request, as that should be properly logged and dealt with in the best manner possible, but I’ll leave that for a later post.

A few Keepitclose Updates

I’ve added a few new features to the Keepitclose code base. The cached version of the resource now also keeps — and reproduces — the headers from the original resource. That will allow text/xml to work as it should, in addition to keeping encoding information and other vital parts of the information contained in the HTTP headers. I’ve also added urlencode() support to the path identifier of the request, so that we’re able to handle requests which feature UTF-8 and other encodings in parts of the URL.

I have a few other features in the pipeline too, but I’m not sure if I’ll find the time to add them until after the weekend. Be sure to do a svn update if you’ve checked out the code!

Releasing Keepitclose Alpha

I’ve just created a small project site over at Google Code for Keepitclose. Keepitclose is a HTTP local caching web service written in PHP meant to cache resources retrieved from external and internal web services. It currently supports both an APC backend (using shared memory in the web server itself) and using Memcache (which allows you to have several frontends using one or several backend memcache servers).

Missing features are currently:

  • Keeping HTTP headers
  • Adding cache information to HTTP headers
  • A file based backend
  • Everything else you can think of

The server do however support:

  • Time to live (TTL) for a cached resource when stored
  • TTL Window, i.e.:

    A new version should be fetched at regular intervals, such as every sixth hour. New data are available at 00:00, 06:00, 12:00 and 18:00. Use ttlWindowSize=21600 to get six hours window size. Use ttlWindowOffset to add a offset; ttlWindowOffset=900 adds fifteen minutes to the value and retrives new content at 00:15, 06:15, 12:15 and 18:15.

If you want to implement your own backend, implement the Keepitclose_Storage_Interface and add an entry into the config.php file for your module, i.e. ‘file’ => array(‘path’ => ‘/tmp’).

The current version also supports simple access control using the allowedHosts entry in the config file. This entry contains one row for each allowed host as a regular expression:

    'allowedHosts' => array(
        '^127\\.0\\.0\\.1$',
    ),

.. will only allow requests from localhost. To allow a subnet, you could write ‘^127\\.0\\.0\\.[0-9]+$’.

You can also add several memcache servers, a request will then poll a random server to retrieve the resource. If not found, the content will be retrieved from the web and stored to all memcache servers. This is useful for environments with a very heavy read load.

'storage' => array(
    'memcache' => array(
        array(
            'host' => 'tcp://127.0.0.1',
        ),
    ),
),

To add more memcache servers, simply add another array() of memcache entries:

'memcache' => array(
    array(
        'host' => 'tcp://127.0.0.1',
    ),
    array(
        'host' => 'tcp://127.0.0.2',
    ),
),

You can also set ‘port’ and ‘persistent’ for the memcache connections.

Please leave a comment here or create an issue ticket on the google code page if you need any help or have any suggestions. All patches are welcome.