The Side Effect of Using Return Type To Handle Errors

I came across a curious little side effect of the issue I just posted earlier:

public function action()
{
   $id = filter_input(INPUT_POST, 'id', FILTER_SANITIZE_STRING);
	
   if(is_bool($id))
   {
      die('invalid id');
   }

   if(isset($_POST['accept']))
   {
      $this->acceptId($id);
   }
   else if(isset($_POST['reject']))
   {
      $this->rejectId($id);
   }
}

This is obviously going to do something when a POST has occured, but the method was invoked each and every time. The reason why it works? If there is no variable named ‘id’ POSTed, filter_input returns null. And null is not boolean, so the test passes. But as it’s not a POST request, neither of the two other if-tests are true, so the code silently passes through (id can contain characters here, so the filter isn’t used to just get integers).

If you’re not going to throw an error when the parameter is missing, the test is actually completely useless. This bug had hidden itself within the usage of the type to test for fault instead of actually checking the value, and did not creep out before I rewrote the if-test.

Communicating The Right Thing Through Code

While trying to fix a larger bug in a module I never had touched before, I came across the following code. While technically correct (it does what it’s supposed to do, and there is no way to currently get it to do something wrong (.. an update on just that), does have a serious flaw:

$result = get_entry($id);
if(is_bool($result))
{
    die('bailing out');
}

Hopefully you can see the error in what the code communicates; namely that the return type from the function is used to what should be considered an error.

While this works as the only way the function can return a boolean value is if it returns false, the person reading the code at a later date will wonder what the code is supposed to do – he or she might not have any knowledge about how the method works. Maybe the method just sets up some resource, a global variable (.. no, don’t do that. DON’T.), etc, but the code does not communicate what we really expect.

As PHP is dynamically typed, checking for type before comparing is perfectly OK, as long as you’re not counting “no returned elements” as an error. The following code more clearly communicates it intent (=== in PHP is comparison based on both type and content, which means that both the type of the variable and the content of it have to match. 0 === “0” will be considered false.):

$result = get_entry($id);
if($result === false)
{
    die('bailing out');
}

Or if you’re interested in getting to know if the element returned is actually considered false (such as an empty array, an empty string, etc), just drop one of the equal signs:

$result = get_entry($id);
if($result == false)
{
    die('bailing out');
}

I’m also not fond of using die() as a method for stopping a faulty request, as that should be properly logged and dealt with in the best manner possible, but I’ll leave that for a later post.

A few Keepitclose Updates

I’ve added a few new features to the Keepitclose code base. The cached version of the resource now also keeps — and reproduces — the headers from the original resource. That will allow text/xml to work as it should, in addition to keeping encoding information and other vital parts of the information contained in the HTTP headers. I’ve also added urlencode() support to the path identifier of the request, so that we’re able to handle requests which feature UTF-8 and other encodings in parts of the URL.

I have a few other features in the pipeline too, but I’m not sure if I’ll find the time to add them until after the weekend. Be sure to do a svn update if you’ve checked out the code!

Releasing Keepitclose Alpha

I’ve just created a small project site over at Google Code for Keepitclose. Keepitclose is a HTTP local caching web service written in PHP meant to cache resources retrieved from external and internal web services. It currently supports both an APC backend (using shared memory in the web server itself) and using Memcache (which allows you to have several frontends using one or several backend memcache servers).

Missing features are currently:

  • Keeping HTTP headers
  • Adding cache information to HTTP headers
  • A file based backend
  • Everything else you can think of

The server do however support:

  • Time to live (TTL) for a cached resource when stored
  • TTL Window, i.e.:

    A new version should be fetched at regular intervals, such as every sixth hour. New data are available at 00:00, 06:00, 12:00 and 18:00. Use ttlWindowSize=21600 to get six hours window size. Use ttlWindowOffset to add a offset; ttlWindowOffset=900 adds fifteen minutes to the value and retrives new content at 00:15, 06:15, 12:15 and 18:15.

If you want to implement your own backend, implement the Keepitclose_Storage_Interface and add an entry into the config.php file for your module, i.e. ‘file’ => array(‘path’ => ‘/tmp’).

The current version also supports simple access control using the allowedHosts entry in the config file. This entry contains one row for each allowed host as a regular expression:

    'allowedHosts' => array(
        '^127\\.0\\.0\\.1$',
    ),

.. will only allow requests from localhost. To allow a subnet, you could write ‘^127\\.0\\.0\\.[0-9]+$’.

You can also add several memcache servers, a request will then poll a random server to retrieve the resource. If not found, the content will be retrieved from the web and stored to all memcache servers. This is useful for environments with a very heavy read load.

'storage' => array(
    'memcache' => array(
        array(
            'host' => 'tcp://127.0.0.1',
        ),
    ),
),

To add more memcache servers, simply add another array() of memcache entries:

'memcache' => array(
    array(
        'host' => 'tcp://127.0.0.1',
    ),
    array(
        'host' => 'tcp://127.0.0.2',
    ),
),

You can also set ‘port’ and ‘persistent’ for the memcache connections.

Please leave a comment here or create an issue ticket on the google code page if you need any help or have any suggestions. All patches are welcome.

Patching Zend_OpenId To Maintain Error Messages

To help anyone else in their quest to implement Zend_OpenId in their code, I’ve created a small patch against trunk which adds error internal error messages in most places where there’s a return false;. This should help when debugging your code, both if you’re implementing an authentication service and use Zend_OpenId as a consumer or if you’re implementing just the consumer part of the equation.

I’ve uploaded the diff: Consumer.php.patch (Consumer.php.patch.txt).

The only thing that seemed a bit weird was that one of the error messages that were present in 1.6.0 had been removed in trunk:

if ($status != 200) {
    this->_setError("'Dumb' signature verification HTTP request failed");
    return false;
}

I’ve readded this message in my patch, but I feel that the error message should be something like:

$this->_setError("The server responded with HTTP status code: " . $status);

instead.

Oh well. Hopefully someone will find this useful.

Adventures in OpenID and Zend Framework

I’ve been toying around with OpenID and the Zend Framework for a night or two now, and I’ve made a few experiences I thought I should share with the intarwebs (now, that’s probably the point where you should make the decision to stop reading for most blog posts). Quite some time has passed since I last had anything to do with OpenID, so just getting up and running was a challenge.

An OpenID identifier is usually represented by an URL (such as https://me.yahoo.com/<login>), which the OpenID consumer then contacts to get information about how to communicate with the OpenID identity provider (Yahoo! in this case). The consumer contacts the provider, gets an URL to redirect the client to, and receives notice after the client has authenticated with the provider.

First I’d like to say that OpenID seems to be too hard to use for any other than those who have a particular interest in it. I have a Yahoo! account and a Google Account, which both can be used for OpenID authentication. I have no idea how I use my Google Account for this, without having to provide endpoints manually. Ugly.

I did at least get the Yahoo! authentication working, but I’m still undecided on wether I’m going to implement OpenID support in any current project. Possibly. We’ll see.

Anyways, my implementation in Zend Framework is mostly a copy of the tutorial in the ZF manual, but there is one important point that they do not mention: In the standard installation, you have to use Zend_Session to handle your sessions. That means calling Zend_Session::start() instead of session_start(), as Zend_Session cannot be used after a session has been started. This dependency kind of killed my enthusiasm, as we just pull the parts of Zend Framework that we need into our project as thing progresses. Changing how we use sessions is a bit too much to ask. Luckily you can still use $_SESSION as usual after starting Zend_Session, but sitll. Not too fond of that. I hope that it will be decoupled some time in the future..

Testing code:

require_once 'Zend/OpenId/Consumer.php';

$consumer = new Zend_OpenId_Consumer();

if (!empty($_POST['openid_identity']))
{
    if(!$consumer->login($_POST['openid_identity']))
    {
        die("OpenID login failed.");
    }
    else
    {
        print('We logged in!');
    }
}

if (isset($_GET['openid_mode']))
{
    switch($_GET['openid_mode'])
    {
        case 'id_res':
            $consumer = new Zend_OpenId_Consumer();
            $id = false;
            
            if ($consumer->verify($_GET, $id))
            {
                $status = "VALID " . htmlspecialchars($id);
            }
            else
            {
                $status = "INVALID " . htmlspecialchars($id);
            }

            print($status);
            break;

        case 'cancel':
            print("someone pressed cancel!");
            break;
    }
}

Switch out $_POST[‘openid_identity’] with your OpenID identifier (the whole URL), and you should be all set.

If you keep getting failed logins without a redirect, check that you have https support in PHP through openssl (the module is named php_openssl). Zend Framework provides no hint that this can be a problem, but after stepping through the source (I’m test driving NetBeans 6.5) the solution became apparent.

PHP and Annotations

After hacking together some code to solve an issue that came up on an IRC channel I’m up today about how to provide a URL mapping for individual methods — and keeping the responsibility in the method itself, I stumbled across addendum. Addendum implements annotations parsing for PHP and works by using the reflection API in PHP 5.1+. This allows you to add annotations which indicate to your framework which methods should be exposed to the web and which should be kept private. There are loads of other ways to do this (both dynamically and statically), but this is one way that may appeal to someone.

PHP Namespaces and the What The Fuck problem

As many people has discovered during the last days, some of the developers behind PHP has finally reached a decision regarding how to do introduce namespaces into PHP. This is an issue which has been discussed on and off for the last three years (or even longer, I can’t really keep track), with probably hundreds of threads and thousand of mail list posts (read through the php.internals mail list archive if you’re up for it). The current decision is to go with \ as the namespace separator. An acceptable decision by itself, and while I personally favored a [namespace] notation, I have no reason to believe that this is not a good solution.

There does however seem to be quite a stir on the internet in regards to decision, which I now will call the “What The Fuck Problem”. Most (if not all) public reactions on blogs, reddit, slashdot and other technology oriented sites can be summed up as “What The Fuck?”. While I’m not going to dwell into the psychological reasons for such a reaction, my guess is that it’s unusual, lacks familiarity for developers in other languages already supporting namespaces and that it might be hard to understand the reasoning behind such a decision.

The problem: to retrofit namespaces into a language which were not designed to support such a construct, without breaking backwards compability. The part about not breaking backwards compability is very important here, as it leaves out everything which could result in a breakage in existing code by simply using a new PHP version.

The discussions have been long, the attempts has been several (thanks to Greg Beaver’s repeated persistence in writing different implementations along the way) and each and every single time an issue has crept in which either breaks existing functionality or results in an ambigiuity when it comes to resolving the namespace accessed. Most issues and the explaination why they are issues, are documented at Namespace Issues in the PHP Wiki. This provides some insight into why the decision to use a separate identifier was chosen.

This seemed to get through in the flamewars discussions after a while, and people instead started to point out the “gigantic flaw”:

If you were to load a class or a namespace dynamically by referencing it in a string, you’d have to take care to escape your backslash:

spl_autoload_register(array("Foo\tBar", "loader"));

.. would mean Foo<tab>Bar. Yep. It actually would. And if that’s the _biggest_ problem with the implementation of using \ as a namespace separator, then I feared its introduction earlier for no apparent reason.

The other examples has plagued us with ambiguity issues, autoloading issues, no-apparent-way-of-putting-functions-and-constants-in-the-namespace-issues and other problems. This way we are left with the task that we, as usual, have to escape \ in a string — when we want to reference a namespace in a dynamic name — and that’s the biggest problem?

I just hope that people keep it sane and don’t implement any special behaviour in regards to how strings are parsed in regards to new, $className::. This _will_ cause problems and magic issues down the road if it ever gets into the engine.

PHP is free, it’s open and it’s yours for the taking. Fork it if you want to, or provide a patch which solves the problem in a better way. The issue has been discussed to death, and so far there is no apparent better solutions on the table. If you have one, you’ve already had three years to suggest it. You better hurry if you’re going to make people realize it now.

Additional observation: most people who has an issue with this, seems to be the same people who would rather be caught dead than writing something in PHP. Yes. Python and Java does it prettier. That’s not a solution to the problem discussed in regards to PHP.

Keeping Your Code in Check: Part 1 – DRY

DRY – Don’t Repeat Yourself – is a central principle for everyone who wants to keep their code maintainable and in a clean and pristine state. It will save you from tedious tasks in the future, although it requires you to do a bit more thinking up front. The clue is to recognize when you’re about to commit the first programming sin: repeating parts of code.

Lets first be completely clear: not repeating code does not mean your code should be as general as possible. Doing that will only end up in a horrible mess that your only hope of selling is to label it “Enterprise” and push all the configuration options into XML. If your way to success is to craft a new buzzword and get the consulting business to sell your product, that might be how you do it. If you’re interesting in writing maintainable and solid software that other people can play around with and get the grip of quite quickly; don’t.

Keep it specialized to do what you need it do easily, but keep it general enough to allow you to reuse it for similar tasks.

The reason why I’m posting this now is unsurprisingly enough that I ran into this issue while coding a library at work. This issue is so common that any programmer will run into it several times a day, but the issue at hand today illustrates the problem perfectly. In a class that I was writing to return a set of data to the user, I decided to add three methods for sorting the data in different manners before returning it. The problem is that the data is kept in different array keys, and it’s the arrays themselves I want to sort. Example:

['displayValue' => 'Aloha Beach', 'count' => 13, 'value' => 'aloha']
['displayValue' => 'Norway', count => 3, 'value' => 'norway']
['displayValue' => 'Sweden Swapparoo', count => 7, 'value' => 'sweden']

To sort this in PHP, you’d use the usort function with a callback function / method that sorts the arrays by the value of the right key. My first version was something like:

private function sortByValue($a, $b)
{
    if (!isset($a['value'], $b['value']))
    {
        if (isset($a['value']))
        {
            return -1;
        }

        if (isset($b['value']))
        {
            return 1;
        }
    
        return 0;
    }

    return strcmp($a['value'], $b['value']);
}

private function sortByCount($a, $b)
{
    if (!isset($a['count'], $b['count']))
    {
        if (isset($a['count']))
        {
            return -1;
        }

        if (isset($b['count']))
        {
            return 1;
        }
    
        return 0;
    }

    return ($b['count'] - $a['count']);
}

As I typed the second if (isset()) in the second method, I finally realized that I were sitting at work and writing the exact same function twice, with just a little twist between them in regards to which field to sort by – and how to determine the sort. One field was numeric, the other was a string. Our goal is to re-use as much as the common code from both methods without typing it up — or much more important, maintaining — it twice.

As you can see, almost the whole function is identicial, except for the key name (‘value’ vs ‘count’) and how the comparison is done (numeric vs string). We need to handle these two issues to be able to use the same function for both purposes, so we rework the two functions into one function for general use for sorting on the value of a key:

private function sortByArrayField($a, $b, $field)
{
    if (!isset($a[$field], $b[$field]))
    {
        if (isset($a[$field]))
        {
            return -1;
        }

        if (isset($b[$field]))
        {
            return 1;
        }
    
        return 0;
    }

    if (is_numeric($a[$field]) && is_numeric($b[$field]))
    {
        return ($b[$field] - $a[$field]);
    }

    return strcmp($a[$field], $b[$field]);
}

This way we handle both the comparison method (if both values are numeric, we do the comparison as a numeric value by simply subtracting the first from the second) and the field to sort (as a third parameter).

Sadly usort does not provide that third parameter for us automagically, but by creating a few simple helper methods for using our refactored function, we get all “configuration” set in those three methods. The code base only contains one implementation of the method itself, but several ways to use it. Example of these three helper methods:

private function sortByValue($a, $b)
{
    return $this->sortByArrayField($a, $b, 'value');
}

private function sortByCount($a, $b)
{
    return $this->sortByArrayField($a, $b, 'count');
}

private function sortByDisplayValue($a, $b)
{
    return $this->sortByArrayField($a, $b, 'displayValue');
}

If someone now comes along and decides to add another column to our array, such as ‘price’, we can simply add another sorting callback to accomodate this:

private function sortByPrice($a, $b)
{
    return $this->sortByArrayField($a, $b, 'price');
}

The sortByArrayField method is already well proven and tried from our previous usage, and by simply changing the field we’re sorting by, we still get the power of the callback, get a completely new sort criteria and the maintainability of just one method.