What Happened To My Beautiful En-dashes?!

First, a small introduction to the problem: We’re running stuff in UTF-8 all the way. A few sites we’re reading feeds from are using ISO-8859-1 as their charset, but they either supply the feed with the correct encoding specific or the feeds arrive as UTF-8. Everything works nicely, except for the mentioned-in-the-headline en-dashes. Firefox only shows 00 96 (0x00 0x96), but everything looks correct when you view the headlines and similiar stuff on the original site.

Strange.

The digging, oh all the digging.

After the already mentioned digging (yes, the digging) in data at the large search engines (ok, maybe I did a search or two), I discovered that the windows cp1252 encoding uses 0x96 to store endashes. This seems similiar! We’re seeing 0x96 as one of the byte values above, so apparently cp1252 is sneaking into the mix somewhere along the lines. Most of the clients using the CMS-es are windows, so they might apparently be to blame.

ISO-8859-1 enters the scene

As the sites (and feeds) provide ISO-8859-1 as their encoding, I thought it would be interesting to see what ISO-8859-1 defines as the representation for the byte value 0x96. Lo’ and behold: 0x96 is not defined in ISO-8859-1. Which actually provides us with the solution.

I welcome thee, Mr. Solution

When the ISO-8859-1 encoded string is converted into UTF-8, the bytes with the value 0x96 (which is the endash in cp1252) is simply inserted as a valid code sequence in UTF-8 which represents a character that’s not defined.

We’re saying that the string is ISO-8859-1, although in reality it is either cp1252 or a mangled version of iso-8859-1 and cp1252 (for the endashes, at least).

If you’re on the parsing end of this mumbo jumbo, one solution is to replace the generated UTF-8 sequence (0xc2 0x96) (converted from 0x96 i ISO-8859-1) with the proper one (0xe2 0x80 0x93):

$data = str_replace("\xc2\x96", "\xE2\x80\x93", $data);

And voilá, everything works.

The Thumbs Up! of Awesome Approval

Every once in a while a few new interesting tools surface themselves and become a natural part of how a developer works. I’ve taken a look at which tools I’ve introduced in my regular workflow during the last six months.

NetBeans

NetBeans got the first version of what has become awesome PHP support in version 6.5, and after version 6.7 got released just before the summer, things have become very stable. NetBeans is absolutely worth looking into for PHP development (and Java), and you sure can’t beat the price (free!). In the good old days NetBeans were slow as hell, but I’ve not noticed any serious issues in 6.7 (.. although we didn’t really have quad cores and 4GB of memory back then either). Go try it out today!

Balsamiq Mockups

Balsamiq is an awesome tool for making quick mockups for UI designs. Previous I’d play around in Adobe Photoshop, dragging layers around and being concerned with all the wrong things. Mockups abstracts away all the UI elements (and comes with a vast library of standard elements), which makes it very easy to experiment and focus on the usability instead of the design and its implementation. For someone who’s more interested in the experience and the programming than the actual design (.. I’ll know what I want when I see it!) this makes it easy to convey my suggestions and create small, visual notes of my own usabilityideas.

You can try it out for free at their website, and they even give away licenses to people who are active in Open Source Development (disclaimer: I got a free license, but the experiences are all my own. This is not paid (or unpaid) advertising or product placement.)

GitHub

I’ve been playing around with git a bit, but after writing a patch for the PEAR-module for Gearman (.. which still doesn’t seem to have made it anywhere significant), I signed up for github to be able to fork the project and submit my patch there. A very good technical solution partnered with an easy way of notifying the original developers of your patch (which you simply provide in your own branch) by submitting a “pull request” makes it very easy to both have patches supplied to you and to submit patches to projects hosted at GitHub.

Thumbs up!

Parsing XML With Namespaces with SimpleXML

There’s one thing SimpleXML for PHP is horrible to use for: parsing XML containing namespaces. Namespaces requires special handling, and the only way I’ve found that allows you to refer to an element in another namespace, is to use the ->children() method with the namespace. I’m sure there’s an easier way than this, and if you know of any, please leave a comment!

Let’s start with the following XML snippet (using SOAP as an example):


    
        
            asdasd
        
    

The easiest way to do this is to “ignore” the namespaces, and simply do $root->{soap:Envelope} that to access the property. This will not work, as SimpleXML is quite peculiar about it’s namespaces (.. while everything else is simple and easy to use).

One solution is to provide the namespace you’re interested in to the $element->children() method, which returns all the children of the element in a particular namespace (or without arguments, outside any namespace):

$sxml = new SimpleXMLElement(file_get_contents('soap.xml'));

foreach($sxml->children('http://www.w3.org/2001/12/soap-envelope') as $el)
{
    if ($el->getName() == 'Body')
    {
        /* ... */
    }
}

Yes. That’s quite horrible.

But luckily the xpath method can help us:

$elements = $sxml->xpath('//soap:Envelope/soap:Body/queryInstantStreamResponse');

This will actually fetch all the elements titled “queryInstantStreamResponse” which are childs of soap:Envelope and soap:Body. And this works as you expect it to, without having to use children, provide the actual namespace URI, etc.

The xpath method returns an array containing all the matching elements, so in this case you’ll receive an array with a single element, containing the text inside the queryInstantStreamResponse element.

There should be an easier way than this.

NTFS Junctions and PHP 5.3.0

After upgrading to PHP 5.3.0 on my Windows XP Workstation, Junctions have suddenly stopped working in any PHP related code. I use junctions to hardlink directories from their version specific paths (NTFS symlinks where first introduced with Vista, so I’m still using Junctions), but after upgrading none of the libraries which live in directories that are linked through junctions work.

This seems to be a known bug, Files on NTFS Mounted Volumes (Junctions) inaccessible, although I’m also seeing the issue with completly local files (and not mounted from remote file systems). Seems like the thing to do is to wait for 5.3.1 to resolve the issue .. if it gets fixed to that. For the time being I’ll manually copy the directories.

Update: I’ve added a log of a test session showing the problem.

Adding SVN Revision to a Configuration File

After a while you realize that the best way to serve almost-never-changing content is to give the content an expire date way ahead in the future. The allows your server and your network pipes to do more sensible stuff than delivering the same old versions of files again and again and again and again.

A problem does however surface when you want to update the files and make the visiting user request the new version instead of the old. The trick here is to change the URL for the resource, so that the browser requests the new file. You can do this by appending a version number to the file and either rewriting it behind the scenes to the original file, or by appending a timestamp (or some other item) to the URL as a GET value. The web server ignores this for regular files, but as it identifies a new unique resource, the web browser has to request it again and use the new and improved ™ file.

Using the timestamp of the file is a bit cumbersome and requires you to hit the disk one additional time each time you’re going to show an URL to one of the almost-static resources, but luckily we already have an identifier describing which version the file is in: the SVN revision number (.. if you use subversion, that is). You could use the SVN revision for each file by itself, but we usually decide that the global version number for SVN is good enough. This means that each time you update the live code base through svn up or something like that (remember to block .svn directories and their files if you run your production directory from a SVN branch. This can be discussed over and over, but I’m growing more and more fond of actually doing just that..). To avoid having to call svnversion each time, it’s useful to be able to insert the current revision number into the configuration file for the application (or a header file / bootstrap file).

Here’s an example of how you can insert the current SVN revision into a config file for a PHP application.

  1. Create a backup of the current configuration file.
  2. Update the current revision through svn up.
  3. Retrieve the current revision number from svnversion.
  4. Insert the revision number using sed into a temporary copy of the configuration file.
  5. Move the new configuration file into place as the current configuration file.
  6. Party like it’s 1999!

This assumes that you use an array named $config in your configuration file. I suggest that you name it something else, but for simplicity I’m going with that here. First, create a $config[‘svn’] entry in your config file. If you have some other naming scheme, you’re going to have to change the relevant parts below.

#!/bin/bash
cp ./config/config.php ./config/config.backup.php
svn up
VERSION=`svnversion .`
echo $VERSION
sed "s/config\['svn'\] = '[0-9M]*';/config\['svn'\] = '$VERSION';/" < ./config/config.php > ./config/config.fixed.php
mv ./config/config.fixed.php ./config/config.php

Save this into a file named upgrade.sh, make it executable by doing chmod u+x upgrade.sh and run it by typing ./upgrade.sh.

And this is where you put your hands above your head and wave them about. When you’re done with that, you can refer to your current SVN revision using $config[‘svn’] in your PHP application (preferrably in your template or where you build the URLs to your static resources). Simply append ?v=$config[‘svn’] to your current filenames. When you have a new version available, run ./upgrade.sh (or whatever name you gave the script) again and let your users enjoy the new experience.

parser error : Detected an entity reference loop

While importing a rather large XML-document (45MB+) into a database today, I ran into a weird problem on one specific server. The server runs SUSE Enterprise and presented an error that neither other test server gave. After a bit of digging around on the web I were able to collect enough information from several different threads about what could be the source of the problem.

It seems that the limit was introduced in libxml2 about half a year ago to avoid some other memory problems, but this apparently borked quite a few legitimate uses. As I have very little experience with administrating SUSE Enterprise based servers, I quickly shrugged off trying to update the packages and possibly recompiling PHP. Luckily one of the comments in a thread about the problem saved the day.

If you find yourself running into this message; swap your named entities in the XML file (such as &lt; and &gt;) to numeric entities (such as &#60; and &#62;). This way libxml2 just replaces everything with the byte value while parsing instead of trying to be smart and keep an updated cache.

Borked Behaviour for the Back-button in Firefox

I investigated a strange problem yesterday, where the back button in Firefox returned the user to the top of the previous page, instead of to the location where he already had scrolled. The problem seemed to have brought its fair share of problems for developers all over, and a thread detailing the problem in Drupal provided the information needed to solve it. The problem is actually so wide-spread that there is a dedicated Firefox extension to solve the issue (Restore Scroll Position).

Anyways, the issue stems from the Cache-Control headers that PHP among others include by default:


Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0

The problem is that the “no-store” directive tells the browser do NOT store a version of the page anywhere, not temporarily, not .. ever. Internet Explorer and Opera still remembers the position, but Firefox decided to take everything a step further and does not keep any information available. The extension mentioned above saves the scroll position in another location and then restores the scroll position after navigating back to the page.

The problem is solved by changing the Cache-Control header:


Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0

A very helpful tip here is that you probably need to restart Firefox to make it respect the new header, as it will keep its old behavior until you restart the browser (at least for Firefox 3.0.6).

PDO and PDO::PARAM_INT

Hi there Mr. PDO!

We’ve come to know each other, and yes, while you have your troubles (.. which I don’t, of course), I’ve accepted your short comings. Today you threw another one of your fits, but I’ll be sure to document it for the world to see.

$statement = $pdo->prepare("
    ...
    LIMIT
        :offset, :hits
");

Yep. This will of course fail if you’re binding strings. ’10’, ’10’ is not very helpful now, is it. Good point. So let’s tell PDO that we’re really binding ints:

$statement->bindValue(':offset', $offset, PDO::PARAM_INT);
$statement->bindValue(':hits', $hits, PDO::PARAM_INT);

But wait. You’re still complaining?! I told you they were ints?! What’s the problem now?!?!

Well. Mr. PDO requires you to also convert the values for him. So first you have to convert the values of a loosely typed language to a strong type, then you have to tell the library that yes, this is in fact another type than what the library obviously assumes that it is. This works:

$statement->bindValue(':offset', (int) $offset, PDO::PARAM_INT);
$statement->bindValue(':hits', (int) $hits, PDO::PARAM_INT);

Which means the following:

If the type of your variable internally is a string, it’ll be escaped as a string, even if you tell PDO that it should be handled as an INT in your database layer.

If the type of your variable is an int, it’ll be handled as a string, unless you tell PDO it is an int.

Something is backwards here.

A Plugin for Paginating in Smarty

First I’d like to apologize for the lack of updates here in the last weeks, but the days have been very busy. I’ve bought a new car (more details about that as soon as the snow disappears), written a complete publishing platform from scratch in a weekend to help out when Gamer.no got in trouble and in general done a load of stuff. Anyways, this post isn’t about all that, but rather something else I wrote some time ago.

A use case you’ll encounter very often is the act of paginating items, i.e. including a simple “jump to page x, jump to the next page, jump to the previous page” footer. If you’ve ever tried to implement the logic around this in your view, you know that it can get quite extensive. You have several other solutions, such as the PEAR_Pager, which actually looks like a good solution now (with 2.x). Anyways, this is a plugin for Smarty to make generating pagination links easier.

Download the plugin, drop it into your plugins/ folder in your Smarty library directory, and voilá, you have access to the new {paginator} element.

The module is quite configurable, but as I’ve only extended the parts I’ve had use for it our projects, it may still lack a few simple keys.

{paginator hits=$hits offset=$offset total_hits=$articlesFound class_inactive=paginatorInactive class_active=paginatorActive}

The template variables used here are hits, the number of hits shown on this page, offset, the offset from 0 and total_hits, the total number of available hits in the current list. By default the plugin appends ?hits=<hits>&offset=<offset>+<hits> to the current URL, you can give another URL through the url attribute. The class_ attributes provide the CSS classes to use for the elements that enclose the page numbers or links. See the source code (!) for more information about attributes which work.

As I mentioned previously, the plugin is written for my own personal use, so it’s not as streamlined as it could be. Feel free to update it, dissect it, break it, claim it’s yours .. or anything. I’d be happy if you submit any patches to me so I can update the link here, or simply leave a comment.

Hack aways! You can see the plugin in action at the bottom of lovethatfun.com.

Undefined symbol: php_pdo_declare_long_constant

After installing a new PDO module (PDO_PgSQL) into our compiled-from-the-ground-up version of PHP 5.2.8 (.. since RHEL4 doesn’t really stay updated, but we do), i ran head first into the following issue:


/usr/sbin/httpd: symbol lookup error: /usr/local/lib/php/extensions/no-debug-non-zts-20060613/pdo_pgsql.so: undefined symbol: php_pdo_declare_long_constant

Panic. Then tried updating pdo_mysql which actually still worked, which just led it to have the exact same problem. Luckily a bit of searching at Google pointed me to PDO_MYSQL causing Apache segfault over at the PECL bug tracker. The last comment provided the solution to the problem: a quick rebuild of PHP with –disable-pdo and then enabling pdo from PECL instead (so that PDO and the PDO plugins API actually match, instead of trying to load the wrong version into the process) solved the issue.

Be sure to build PDO from the SAME VERSION as your client libraries. Disable it in the PHP build itself if you need to build it from PECL.