October 2008 – Mats Lindh

PHP Namespaces and the What The Fuck problem

As many people has discovered during the last days, some of the developers behind PHP has finally reached a decision regarding how to do introduce namespaces into PHP. This is an issue which has been discussed on and off for the last three years (or even longer, I can’t really keep track), with probably hundreds of threads and thousand of mail list posts (read through the php.internals mail list archive if you’re up for it). The current decision is to go with \ as the namespace separator. An acceptable decision by itself, and while I personally favored a [namespace] notation, I have no reason to believe that this is not a good solution.

There does however seem to be quite a stir on the internet in regards to decision, which I now will call the “What The Fuck Problem”. Most (if not all) public reactions on blogs, reddit, slashdot and other technology oriented sites can be summed up as “What The Fuck?”. While I’m not going to dwell into the psychological reasons for such a reaction, my guess is that it’s unusual, lacks familiarity for developers in other languages already supporting namespaces and that it might be hard to understand the reasoning behind such a decision.

The problem: to retrofit namespaces into a language which were not designed to support such a construct, without breaking backwards compability. The part about not breaking backwards compability is very important here, as it leaves out everything which could result in a breakage in existing code by simply using a new PHP version.

The discussions have been long, the attempts has been several (thanks to Greg Beaver’s repeated persistence in writing different implementations along the way) and each and every single time an issue has crept in which either breaks existing functionality or results in an ambigiuity when it comes to resolving the namespace accessed. Most issues and the explaination why they are issues, are documented at Namespace Issues in the PHP Wiki. This provides some insight into why the decision to use a separate identifier was chosen.

This seemed to get through in the ~~flamewars~~ discussions after a while, and people instead started to point out the “gigantic flaw”:

If you were to load a class or a namespace dynamically by referencing it in a string, you’d have to take care to escape your backslash:

spl_autoload_register(array("Foo\tBar", "loader"));

.. would mean Foo<tab>Bar. Yep. It actually would. And if that’s the _biggest_ problem with the implementation of using \ as a namespace separator, then I feared its introduction earlier for no apparent reason.

The other examples has plagued us with ambiguity issues, autoloading issues, no-apparent-way-of-putting-functions-and-constants-in-the-namespace-issues and other problems. This way we are left with the task that we, as usual, have to escape \ in a string — when we want to reference a namespace in a dynamic name — and that’s the biggest problem?

I just hope that people keep it sane and don’t implement any special behaviour in regards to how strings are parsed in regards to new, $className::. This _will_ cause problems and magic issues down the road if it ever gets into the engine.

PHP is free, it’s open and it’s yours for the taking. Fork it if you want to, or provide a patch which solves the problem in a better way. The issue has been discussed to death, and so far there is no apparent better solutions on the table. If you have one, you’ve already had three years to suggest it. You better hurry if you’re going to make people realize it now.

Additional observation: most people who has an issue with this, seems to be the same people who would rather be caught dead than writing something in PHP. Yes. Python and Java does it prettier. That’s not a solution to the problem discussed in regards to PHP.

Java printf and Booleans

The available stdout mapping in Java, System.out (PrintStream), supports printf (to format a string when outputting it) in the same manner as other languages such as C. By using System.out.printf instead of System.out.print, you can also provide a formatting string which the method can use to write the output in a custom format.

As Java supports several other native datatypes that weren’t available as regular C-types, you also have a few other options. One of these are the %b / %B identifier, which represents a boolean value.

The format string “%b” expects one argument, a boolean. This will be written as either ‘true’ or ‘false. You would (as of writing) get the same result by using the %s identifier as this would cast the boolean of a string, which in turn would give true or false, but it’s always better to be explicit about what you are doing than assuming that people understand that you understood something (which even could change..).

Upgrading to SWFUpload 2.2.0-beta

SWFUpload 2.2.0-beta has been released to fix the issue that has occured after the release of Flash 10, where regular SWFUpload-based applications ceases to work. This is because of a security update to Flash Player, where the plugin refuses to show the upload files dialog unless the plugin has focus — and the request to show the dialog is in response to an user event. This means that the user now has to click somewhere in the flash file before we’re able to show the upload dialog.

The way most file upload components solve this is by overlaying a transparent flash file over the regular user interface element, so that the user clicks the flash file instead of the visible HTML element. SWFUpload 2.2.0 supports this, in addition to creating a styled flash based button instead of the regular HTML elements.

To upgrade from our previous 2.1.0 based installation, I did the following:

Added in the settings object for SWFUpload:

        button_placeholder_id : "selectFilesButtonPlaceholder",
        button_disable : false,
    	minimum_flash_version : "9.0.28",
    	swfupload_pre_load_handler : swfUploadPreLoad,
    	swfupload_load_failed_handler : swfUploadLoadFailed

Swapped out the previous references to graceful degradation and SWFUpload 2.1.0:

Remembered to update flash_url in the settings object:

flash_url : "/flash/swfupload_f10.swf",

Added three new callbacks functions instead of using the references to the previous degradation elements (swfupload_element_id, degraded_element_id):

function swfUploadLoaded()
{
    // set the dimensions of the button to match the outer container element (jQuery!)
    swfu.setButtonDimensions($("#buttonContainer").width(), $("#buttonContainer").height());
}

function swfUploadPreLoad()
{
    // hide HTML interface, show Flash interface..
    $("#regularUploader").hide();
    $("#flashUploader").show();
}

function swfUploadLoadFailed()
{
}

Added a container element in the HTML that the Flash app will simply take control over (it will NOT lay itself transparent over this element!):

.. and remove the old reference to swfu.selectFiles();.

That should be all.

The Golden Rule of Frameworks

All other frameworks sucks.

Informa and getCategories Truncates Title at “/”

I stumbled across a weird issue in Informa and the ItemIF.getCategories method today. The categories we retrieve are separated with / to indicate their full hierarchy, but Informa only gave me the first part of the category (just “Properties” of “Properties / Houses”). The solution to this is to explicitly access the category element of the object itself:

String categoryDomain = item.getAttributeValue("category", "domain");
String categoryTitle = item.getElementValue("category");

This should be extended to support several category-elements, but as we only get one in our feeds, this solved the problem for us.

Transparent Remapping of / For Struts Actions

I’ve been trying to find a solution to this issue for a couple of hours: We have several struts actions in our Java-based webapp, all neatly mapped through different .do action handlers. I wanted to switch our handling of the root / index URL (http://www.example.com/) from being a static redirect to the actual action to instead present the content right there, without the useless redirect. This apparently proved to be harder than I thought; after searching a lot through Google, reading through mailing lists and the official documentation, it seems that there is no way to specify an action to handle these requests by default in struts. There may be one, but I could not for the life of ${deity} find it.

As simply doing it in Struts were out of the question, I turned to an old friend of mine, the ever so helpful mod_rewrite. mod_rewrite is capable of rewriting URLs internally in Apache before they get handled at other levels. The problem was that mod_jk seemed to grab the request before the replacements were made, but a few resources pointed me in the correct direction:

After a bit of debugging with the rewritelog, everything came together. This is how it ended up:

       	RewriteEngine On
       	RewriteLog /tmp/rewrite.log
       	RewriteLogLevel 3
       	RewriteRule ^/?$ /destinationfile [PT,NE]

The RewriteLog say that we should log the rewrite progress to /tmp/rewrite.log, and RewriteLogLevel say that we should log at the most detailed format. Use 0, 1, 2 for less debugging information. Remember to comment out these lines when things work. DO NOT use the RewriteLog when not actually debugging the rewrites.

Keeping Your Code in Check: Part 1 – DRY

DRY – Don’t Repeat Yourself – is a central principle for everyone who wants to keep their code maintainable and in a clean and pristine state. It will save you from tedious tasks in the future, although it requires you to do a bit more thinking up front. The clue is to recognize when you’re about to commit the first programming sin: repeating parts of code.

Lets first be completely clear: not repeating code does not mean your code should be as general as possible. Doing that will only end up in a horrible mess that your only hope of selling is to label it “Enterprise” and push all the configuration options into XML. If your way to success is to craft a new buzzword and get the consulting business to sell your product, that might be how you do it. If you’re interesting in writing maintainable and solid software that other people can play around with and get the grip of quite quickly; don’t.

Keep it specialized to do what you need it do easily, but keep it general enough to allow you to reuse it for similar tasks.

The reason why I’m posting this now is unsurprisingly enough that I ran into this issue while coding a library at work. This issue is so common that any programmer will run into it several times a day, but the issue at hand today illustrates the problem perfectly. In a class that I was writing to return a set of data to the user, I decided to add three methods for sorting the data in different manners before returning it. The problem is that the data is kept in different array keys, and it’s the arrays themselves I want to sort. Example:

['displayValue' => 'Aloha Beach', 'count' => 13, 'value' => 'aloha']
['displayValue' => 'Norway', count => 3, 'value' => 'norway']
['displayValue' => 'Sweden Swapparoo', count => 7, 'value' => 'sweden']

To sort this in PHP, you’d use the usort function with a callback function / method that sorts the arrays by the value of the right key. My first version was something like:

private function sortByValue($a, $b)
{
    if (!isset($a['value'], $b['value']))
    {
        if (isset($a['value']))
        {
            return -1;
        }

        if (isset($b['value']))
        {
            return 1;
        }
    
        return 0;
    }

    return strcmp($a['value'], $b['value']);
}

private function sortByCount($a, $b)
{
    if (!isset($a['count'], $b['count']))
    {
        if (isset($a['count']))
        {
            return -1;
        }

        if (isset($b['count']))
        {
            return 1;
        }
    
        return 0;
    }

    return ($b['count'] - $a['count']);
}

As I typed the second if (isset()) in the second method, I finally realized that I were sitting at work and writing the exact same function twice, with just a little twist between them in regards to which field to sort by – and how to determine the sort. One field was numeric, the other was a string. Our goal is to re-use as much as the common code from both methods without typing it up — or much more important, maintaining — it twice.

As you can see, almost the whole function is identicial, except for the key name (‘value’ vs ‘count’) and how the comparison is done (numeric vs string). We need to handle these two issues to be able to use the same function for both purposes, so we rework the two functions into one function for general use for sorting on the value of a key:

private function sortByArrayField($a, $b, $field)
{
    if (!isset($a[$field], $b[$field]))
    {
        if (isset($a[$field]))
        {
            return -1;
        }

        if (isset($b[$field]))
        {
            return 1;
        }
    
        return 0;
    }

    if (is_numeric($a[$field]) && is_numeric($b[$field]))
    {
        return ($b[$field] - $a[$field]);
    }

    return strcmp($a[$field], $b[$field]);
}

This way we handle both the comparison method (if both values are numeric, we do the comparison as a numeric value by simply subtracting the first from the second) and the field to sort (as a third parameter).

Sadly usort does not provide that third parameter for us automagically, but by creating a few simple helper methods for using our refactored function, we get all “configuration” set in those three methods. The code base only contains one implementation of the method itself, but several ways to use it. Example of these three helper methods:

private function sortByValue($a, $b)
{
    return $this->sortByArrayField($a, $b, 'value');
}

private function sortByCount($a, $b)
{
    return $this->sortByArrayField($a, $b, 'count');
}

private function sortByDisplayValue($a, $b)
{
    return $this->sortByArrayField($a, $b, 'displayValue');
}

If someone now comes along and decides to add another column to our array, such as ‘price’, we can simply add another sorting callback to accomodate this:

private function sortByPrice($a, $b)
{
    return $this->sortByArrayField($a, $b, 'price');
}

The sortByArrayField method is already well proven and tried from our previous usage, and by simply changing the field we’re sorting by, we still get the power of the callback, get a completely new sort criteria and the maintainability of just one method.

The First Rule of Debugging

Seems like there are quite a lot of ideas about what the first rule of debugging are, and a quick Google Search gives you insightful suggestions such as:

These are all valuable insights that provide clear value about what and how you could attack the problem of the Bug That Wasn’t Supposed To Be There (there are surely bugs that actually were supposed to be there, because of invalid domain specifications, etc.). I did however find one small column from DDJ that led into the same thing I’m going to write now: “Before you go to fix it, be sure you’re fixing the right thing.”

I will assume that you already have uncovered that you actually have a bug — you’re not getting the results you expected, and something is to blame. You start out by trying to isolate the problem, and try to recreate the broken situation with different inputs. You try it on another system to see if there’s something about the configuration, about the dataset, about the versions of your library or something, which we’ll disover later, is completely irrelevant.

I obviously forgot this rule at work today, and that’s why I’m writing this post now. Remember kids. Always make sure you’re fixing the right thing! I’ve started porting one of the front end services we use to federate searches from a borked Java library to a PHP-based implementation instead. This involves talking to an internal search server that I hadn’t written code to interface with before, other than the existing search service. The implementation went from zero to usable in almost no time, and I now have code that provides a nice foundation for the remaining work. The problem is that I obviously decided to fix another issue with the search server that I had looked into earlier, and it seems that I thought that since I now had new and improved experience with the service, I’d be better off this time.

So I set out trying to get a new sorting scheme working for the search results. At first I failed, but then I stumbled across some documentation that threw me in the right direction. I did a few minor changes, and the sorting changed. Woohooo! Well, it almost worked. The order was not quite as expected, but the results seemed to be close enough for the proximity search to be working. It just had to be something in the documentation or implementation that I had missed! Further digging and experimentation for the greater part of an hour or so left me blank, but I managed to get another search ordering which made me think that there was something peculiar about my input parameters that created the issue.

After almost two and a half hour of this, I happened to read four lines that were neatly tucked away in a “process” part of the resulting XML document from the search service. In the diffuse light glooming from my Dell monitor, I could read the now obvious words: “<subsystem> No license.”.

Yep. It never had anything to do with the actual result. It was never even active. The change of the sorting must have happened for some other reason (such as removing the default values or anything like that). I weren’t supposed to debug why the sorting was incorrect, I was supposed to find out why the subsystem didn’t load.

Grrr.

Mats’ (borrowed) first rule of debugging: Make sure you debug the right problem. Do not make assumptions.

(and a rule of documentation: if the feature may not be available in the installation, write how you can check this before actually giving examples and detailing how to use the feature)

First Impression of the WordPress 2.7 Wireframes

I’ve done a few posts about WordPress Usability Issues before (and the original wordpress usability post..), documenting my experiences with starting to use WordPress from scratch. I still manage to publish loads of posts without tags and categories, so I’ve been eager to find out if there is anything in the new version of WordPress that will solve this issue for me.

Then a couple of days ago an article appeared on wordpress.org containing a collection of wireframes for how the layout in WordPress 2.7 is currently planned. Sadly enough, it seems to place the tags and categories part of the layout to a location which is even harder to notice: Below the items in the sidebar at the right. I just discovered that I have a “Shortcuts” and “Related” menu items there, so I’m quite sure that this will be a place that’ll leave me missing the tags and categories even more often than today.

If the placement is set in stone, I’d strongly suggest adding some kind of visual hint to the Save/Publish buttons that indicate that tags and categories are missing; at least if you’ve enabled an option in the settings part of wordpress. It seems as the boxes are draggable in the wireframes, but for fresh users, this will still be confusing.

I guess most people at least want to add a category for their posts, but the wordpress guys probably have a lot of usage statistics on this from wordpress.com. It’d be very interesting to see any change in the usage of tags and categories between the different layouts, and the number of edits that simply add tags and categories within the first 24 hours after a post has been published.

(and yey! I remembered tags/categories on this one.)

rinetd saves the day (.. or night, more properly)

During thursday we moved our mail server and domain controller from our Oslo offices to the Fredrikstad office. As most of our staff now resides in Fredrikstad, this was a logical thing to do to keep the resources close to the people. The move went as planned, and all the servers and services started up as they should. There was one slight problem, as we’re accepting new mail through the link to our Oslo office (as that’s were the mailserver used to live), and our firewall routes the SMTP connections to our internal mailserver. We have an external mail relay that does mail filtering, before delivering the mail to our server. This means that we only accept SMTP connections from that particular host, as we’re not the general recipent (as we’re not the MX-record) for our own domain. This is handled in the Exchange configuration, where the allowed IP-s are set up.

Anyways, the connection between the two offices are handled internally by using a IPsec link, which routes encrypted traffic between the two networks. This has worked out just like it should, and we’ve been very happy with the setup. After moving the mail server, our plan was to simply route the connections that are coming in to our SMTP server over the VPN link to the other office, and then deliver the mail to the new address of the mail server. This proved to be more troublesome than expected. As the external filter service is run by an external company, we were unable to contact them to get them to change the address on short notice. After a couple of hours trying to get the packets to flow the right way, we decided to use an old friend of ours to help out instead; rinetd.

rinetd is a very, very simple daemon, which simply accepts a connection on a port and redirects that port on the socket layer to another host (or port). We did however need one other feature of rinetd; access control. As I mentioned earlier, we’re only interested in allowing certain hosts to talk to our SMTP server. Since the connections now are routed through rinetd instead, the only host Exchange ever will se delivering E-mail, is the rinetd server. This means that we had to implement the access control in rinetd instead, filtering on the external IPs connecting to our server.

rinetd allows these settings to be provided globally, so that you are able to allow or deny certain hosts for all services using rinetd. This would have worked out nice, if it hadn’t been for the fact that we also run our web mail through exchange on the same host. So we also need to forward the SSL connection for the webmail, and we have to allow all hosts to connect to the webmail. Luckily, rinetd also makes it possible to solve this, as you’re able to provide access rules for each of the forwarded ports separately.

There is however one caveat; the global rules will always be evaluated first. If you have certain global rules that deny or allow traffic (if you have one allow rule, all other traffic will be denied, unless explicitly allowed), the connection might never reach the evaluation of the “local” rules for each forward. This means that if one of your connections should allow people to connect from anywhere (such as our ssl webmail), your global settings have to allow all clients to connect.

We therefor moved our limitations for the SMTP server to that section instead, and ended up with this access control configuration:

# bindadress    bindport  connectaddress  connectport
# SMTP
<ip1> 25 <ip2> 25
allow 192.168.0.*
allow 192.168.1.*
allow <external ip>
allow <external ip>
allow <external ip>

# SSL
<ip1> 443 <ip2> 443
allow *

And after that, everything worked perfectly until we got our mail filter provider to switch to the new address of the SMTP server. rinetd saves us that night, and not a single one of our staff noticed the difference. If they don’t know we’re there, we’re doing something right.