PHP – Page 6 – Mats Lindh

Fatal error: Exception thrown without a stack frame in Unknown on line 0

While Christer were updating his profile on one of our current projects, he suddenly got this rather cryptic message. We tossed a few ideas around, before just leaving it for the day. We came back to the issue earlier today, and I suddenly had the realization that it had to have something to do with the session handling. Changing our session handler from memcache to the regular file based cache did nothing, so then it had to be something in the code itself.

The only thing we have in our session is an object representing the currently logged in user, so it had to be something in regards to that. After a bit of debugging I found the culprit; a reference to an object which contained a reference to a PDO object. PDO objects cannot be serialized, and this exception was being thrown after the script had finished running. The session is committed to the session storage handler when the script terminates, and therefor is run out of the regular script context (and this is why you get “in Unknown on line 0”. It would be very helpful if the PHP engine had been able to provide at least the message from the exception, but that’s how it currently is.

Hopefully someone else will get an Eureka!-moment when reading this!

The solution we ended up with was to remove the references to the objects containing the unserializable structures. This was done by implementing the magic __sleep method, which returns a list of all the fields that should be kept when serializing the object (we miss the option of instead just unsetting the fields that needs to be unset and then let the object be serialized as it would have if the __sleep method wasn’t implemented). We solved this by implemeting the __sleep method and removing our references, before returning all the fields contained in the object:

public function __sleep()
{
    $this->manager = null;
 
    return array_keys(get_object_vars($this));
}

And there you have it, one solved problem!

Getting a Look Behind the Scenes of PDO

Ulf Wendel is presenting several good articles about the process about implementing the mysqlnd library for PDO as PDO_MYSQLND. I wrote about PDO_MYSQLND when it first was announced, but Ulf has posted two good articles about the implementation of PDO_MYSQLND since then. These articles provide a unique insight into how PDO is built and what challenges lies ahead for PDO2 (.. in addition to the license and NDA debate..)

Remember that you can always follow the latest developments over at Planet PHP.

Using Apache httpd as Your Caching Solution

In this article I’m going to describe a novel solution for making cached versions of dynamic content available, while attempting to strike a balance between flexibility, performance and the origin of dynamic content. This solution may not be suited for very dynamic content (where the updates are better triggered by rewriting the cached version when the content changes), but in those situations where the dynamic content may be built from a very large dataset on request from the users. I have two use cases detailing applications I’ve been involved in building where I have applied this strategy. This could also be implemented with a caching service in front of the main service, but will require the installation of a custom service and hardware etc. for that service.

The WMS Cache

WMS (Web Map Service) is an OGC (Open Geospatial Consortium) specification which details a common set of parameters for how to query a web service which returns a raster map image (a regular png/jpg/bmp file) for an area. The parameters include the bounding box (left,bottom,right,upper) and the layers (roads,rivers,etc) and the size of the resulting image. The usual approach is to add a caching layer in the WMS itself, so any generated image is simply stored to disk, and then checked if the disk exists before retrieve the data and rendering the image (and if it exists, just return the image data from disk instead). This will increase the rate of requests the WMS can answer and will take load off the server for the most common requests. We are still left with the overhead of parsing the request, checking for the cached file and most notably, loading our dynamic language of choice and responding to the request. An example of such a small and naive PHP application is included:

The next request which arrives with the identical set of GET-parameters, will be served with the overhead of loading PHP, parsing the PHP-script (which is less if you have APC or a similar cache installed), sorting the GET-parameters (so that bbox=..&x=.. is the same as x=..&bbox=..), serializing the response, checking that the file exists on disk (you could simplify this to just doing a read and checking if the read succeeded), copying the data from disk to memory and then outputting the data to the client (you could also use fpassthru() and friends which may be more optimized for simple reading and output of data, but that's not the main point here).

To relate this to our use case of the WMS, we need to take a closer look at how map services are used today. Before Google showed the world what a good map solution could look like with modern web technology, a map application presented an image to the user, allowed the user to click or drag the image to zoom or move, and then reloaded the entire page to generate the new image. If it took 0.5s to generate the image, that were not really a problem, as the data set is not updated very often and it is very easy to do these operations in parallel across a cluster. When Google introduced Google Maps, they loaded 9 visible images (tiles) in the first image, and then started loading other tiles in the background (so that when you scroll the map, it looks like the images are already in place). If you run an interface similar to Google Maps against a regular WMS, most WMS servers would explode and take the whole 42U rack with them. Not a very desirable situation. The easy solution if you have an unlimited set of resources, disk space and money is to simply generate all the available tiles up front, in the same way as Google has done it. This will require disk space for all the tiles, and will not allow your users to choose which layers then want included in the map (this will change as map services are starting to build each layer as a separate tile and then superimposing them in the user interface).

The problem is that most of us (actually, n - 1) are not Google, but most of us do not build map services either. For those of us who do, we needed a way of living somewhere in between of having to render our complete dataset to image tiles up front or running everything through the WMS. While working with Gunnar Misund at Østfold University College, I designed a simple scheme to allow compatible clients to fetch cached tiles automagically, while those tiles which did not exist yet, were generated on the fly from the background WMS. The idea was to let Apache httpd handle the delivery of already generated and cached content, while our WMS could serve those areas which were viewed for the very first time (or where the layer selection were new). It would not be as fast as Google Maps for non-cached content, but it wouldn't require us to run through our complete service to generate images either.

The solution was to let the javascript client request images through a custom URL:

http://example.com/300/400/10/59.205278/10.95/rivers,roads/image.jpg

(This is just an example, and does only contain the center point of the image). This is decomposed into:

http://example.com/x_width/y_height/zoomlevel/centerlat/centerlon/layers/image.fileformat

This is all good as long as image.jpg exists in the local path provided, so that Apache can just serve the image as it is from the location. Apache httpd (or lighttpd and other "serve files fast!"-httpds) are able to serve these static files in large numbers (it's what they were written for, you know..) with a minimum overhead. The problem is what to do when the file actually does not exist, which will happen each time a resource is requested for the first time, and we do not have a cache yet. The solution lies in assigning a PHP-file as the handler for any 404 error (file not found). This is a well known trick used all over the field (such as handling www.php.net/functionname direct lookup). In PHP you can use $_SERVER['REQUEST_URI'] to get the complete path of the request that ended in the 404.

The .htaccess file of the application is as simple as cake:

ErrorDocument 404 /wms/handler.php

I've enclosed a simple specification which were written as a description of the implementation when the project was done in 2005.

Thumbnail generation

Generating thumbnails can also be transformed into the same problem set. In the case where you need several different sizes of thumbnails (and different rescales are needed for different applications), you can apply the same strategy. Instead of handing all the information to a resize script with the file name etc. as the argument, simply have the xsize and the ysize as part of the URL. If the file exists in the path, it's served directly with no overhead, otherwise the 404 handler is invoked as in the previous example. The thumbnail can then generated, saved in the proper location and the world can continue to rotate at it's regular pace.

This application can then be extended by adding new parameters in the url, such as the resize method, if the image should be stretched, zoomed and other options.

Conclusions

This is a very simple scheme that does not require any custom hardware or server software installed, and places itself neatly in between having a caching front end server between the client and the application and the hassle of generating the same file each and every time. It allows you to remove the overhead of invoking the script (PHP in this case) for each request, which means that you can serve files at a much greater rate and let your hardware do other, more interesting things instead.

PHP Vikinger Notes

Just a few notes from PHP Vikinger which were arranged by Derick Rethans in Norway today. Things went mostly smoothly and people in general seemed to have a very good time. These are just some of the random notes I made during the sessions.

Derick: You already have notes, foo, foo2 and foo3 files. After PHP Vikinger you also have foo4 and foo5. Remember to start at foo6 next time.
Selenium, PHP Unit
eZ Components, Workflow
Couch DB
talks.php.net
New Garbage Collector for PHP 5.3!
phpUnderControl for CruiseControl
Phing, phar
phar-support in APC, “a complete website in a phar archive”-support in 5.3 with phar extension
Zend_Search_Lucene is slow! Anyone want to try to find out why?
xdebug
Database Abstraction in eZ Components
re2c
They don’t make pizza in Skien until after 13:00 on saturdays.
Derick never got to talk about namespaces. Weird…

All in all it was a good unconference, with a friendly and laid back tone and hopefully people got what they came for. Next time I’ll try to prepare a simple presentation on some interesting and hopefully not too familiar topic and actually contribute something too. We drove from Halden and Fredrikstad to Skien in the morning and back in the evening, which worked out quite OK, except for .. well, the lack of sleep in the morning. But everyone survived and managed to stay awake, so I conclude that the trip was a great success.

To sum it all up: a banana is a fruit and a tomato is a berry. You probably had to be there for that one.

Thanks for the unconference, and hopefully I’ll be able to attend more events in the future too.

UPDATE: Derick also has a writeup online from PHP Vikinger.

A Redirect Does Not Stop Execution

This is just a public service announcement for all the inexperienced developers who are writing redirects in PHP by issuing a call to header(“Location: <new url>”) to do their redirect. I see the same mistake time over and over again, and just to try to make sure that people actually remember this:

A Call to Header (even with Location:) Does NOT Stop The Execution of the Current Application!

A simple example that illustrates this:

 /* DO NOT COPY THIS EXAMPLE. */

if (empty($_SESSION['authed']))
{
    header('Location: http://example.com/');
}

if (!empty($_POST['text']))
{
    /* insert into database */
}

/* Do other admin stuff */

The problem here is that the developer does not stop script execution after issuing the redirect. While the result when testing this code will be as expected (a redirect happens when the user is not logged in and tries to access the link). There is however a gaping security hole here, hidden not in what’s in the file, but what’s missing. Since the developer does not terminate the execution of the script after doing the redirect, the script will continue to run and do whatever the user asks of it. If the user submits a POST request with data (by sending the request manually), the information will be inserted into the database, regardless of wether the user is logged in or not. The end result will still be a redirect, but the program will execute all regular execution paths based on the request.

Support for Solr in eZ Components’ Search

The new release of eZ Components (2008.1) has added a new Search module, and the first implementation included is an interface for sending search requests and new documents to a Solr installation. An introduction can be found over at the eZ Components Search Tutorial. The new release of eZ Components requires at least PHP 5.2.1 (.. and if you’re not already running at least 5.2.5, it’s time to get moving. The world is moving. Fast.).

Followup on The Missing Statistics in OpenX

After my previous post about the missing OpenX statistics because of crashed MySQL-tables, I got a very nice and helpful comment from one of the OpenX developers. To put it one single word: awesome. If you’re ever going to run a company and have to look after your customers (even if you release your project as open source), simply do that. People will feel that someone are looking out for them.

Anyways, as promised, this were supposed to be a follow up. We didn’t manage to get the impressions statistics back, but the missing clicks returned after repairing the tables. The tip from Arlen didn’t help either, but I have a few suggestions for how to make the script easier to use.

I were kind of perplexed about how I could give the dates for the time interval it was going to rebuild the statistics. The trick was to change two define()-s in the top of the code. Not very user friendly, so I made a small change to use $argc and $argv instead. That way I could do:

    php regenerateAdServerStatistics.php "2008-06-01 10:00:00" "2008-06-01 10:59:59"

instead of having to edit the file and changing the defines every time. After doing this simple change, I could also write a small helper script that ran the regenerateAdServerStatistics.php file for all the operation intervals within the larger interval (an operation interval is an hour, while my interval were several days).

So, here it is, regenerateForPeriod.php:

 ");
    }

    $start = $argv[1];
    $end = $argv[2];

    $start_ts = strtotime($start);
    $end_ts = strtotime($end);

    if (!$start_ts || !$end_ts || ($start_ts >= $end_ts))
    {
        exit("Invalid dates.");
    }

    $current_ts = mktime(date('H', $start_ts), 0, 0, date('m', $start_ts), date('d', $start_ts), date('y', $start_ts));

    while($current_ts < $end_ts)
    {
        system('php regenerateAdServerStatistics.php "' . date('Y-m-d H', $current_ts) . ':00:00" "' . date('Y-m-d H', $current_ts) . ':59:59"');
        $current_ts += 3600;
    }
?>

This runs the renegerateAdServerStatistics.php script for each operation interval. If your ad server uses a larger interval than 3600 seconds, change the setting to a more appropriate value. Before doing this, you’ll want to remove the sleep(10) and the warning in regenerateAdServerStatistics.php, so that you don’t have to wait 10 seconds for each invocation of the script. I removed the warning and sleep altogheter, but hopefully someone will commit a command line parameter to regenerateAdServerStatistics.php that removes the delay. I didn’t have time to clean up the code and submit an official patch today, but if there is interest, leave a comment and I’ll consider it.

Misunderstanding How in_array Works

Brian Moon has a post about how in_array() really, really sucks. This is not a problem with in_array per se, but failing to recognize the proper way to solve a problem like this. Some of the comments has already touched on this matter, but I’ll attempt to further expand and describe the problem.

You have two arrays; a1 and b2. You’re interested in removing all the values from a1 that also are in b2. If you’re doing the naive approach (which Brian Moon describes), you’ll simply do:

foreach($a1 as $key => $value)
{
    foreach($b2 as $key2 => $value2)
    {
        if ($value == $value2)
        {
            unset($a1[$key]);
        }
    }
}

(ignore any potential side effects of manipulating $a1 while looping through it for now)

This will work for small sizes of a1 and b2, but as soon as the number of entries starts to increase (let’s call them m and n), you’ll see that the growth of your function will approach O(m*n), which can be written as O(n²) as both values approach infinity. This is not good and is the same complexity that you’ll find in naive sorting algorithms. This means that for each element you add to the array, your processing time increases quadratically (since you have two loops here). in_array is simply a shortcut for the inner loop (the inner foreach loop) in this example. It loops through each element of the array and checks if it matches the needle we’re searching for.

Compare this to using array_flip on the array to search first, so that the values becomes the keys:

foreach($a1 as $key => $value)
{
    if (isset($b2[$key]))
    {
        unset($a1[$key]);
    }
}

But why is isset($arr[$key]) any faster than using in_array? Doesn’t the application just have to loop through a different set of values instead (this time, the keys instead of the values)? No. This is where hashing comes into the picture. As $key can be any string value in PHP, the string is hashed and resolved to an internal array id. This means that internally, the following is happening:

$arr[$id] => find location by converting $id to an internal array location (on the C-level) => simply index the array by using this value

Instead of going through all the valid keys, the $id is converted once, and then checked to see if there is any value stored at that location. This is a simplification, but explains the concept. The complexity of this conversion may depend on the length of the key (depending on the choice of hashing function), but we’ll ignore this here, and simply give it a complexity of O(1). Looking up the index in the array is also an O(1) operation (it takes the same time, regardless if we’re looking at item #3 or #4818, it’s simply reading from different locations in memory).

This means that while our other loop is still looping through n elements, we’re now just doing a constant lookup in the innerloop. The amount of work done in the inner loop does not depend on the number of elements in b2, and this means that our algorithm instead grows in a linear fashion (O(n)).

Further reading can be found at Wikipedia: Hash function, Big O Notation. I’ll also suggest reading an introductionary book into the field of algorithms and datastructures. The type of book depends on your skillset, but if anyone wants any suggestions, just leave a comment and I’ll add a few as I get home to my bookshelf tonight.

Implementing a Duck Operator with Reflection

Following up on this post regarding a “duck operator” in PHP, I went ahead and wrote a very, very, very simple implementation using reflection api in php to get the same functionality.

getMethods() as $method)
        {
            $ret[$method->getName()] = true;
        }
        
        return $ret;
    }
    
    function it_quacks($object, $interface)
    {
        $reflectionClass = new ReflectionClass($interface);
        $reflectionObject = new ReflectionObject($object);
        
        $reflectionClassMethods = getMethodProperties($reflectionClass);
        $reflectionObjectMethods = getMethodProperties($reflectionObject);
        
        foreach($reflectionClassMethods as $methodName => $methodData)
        {
            if (empty($reflectionObjectMethods[$methodName]))
            {
                return false;
            }
        }
        
        return true;
    }

    if (it_quacks(new MooingGrassEater(), 'Cow'))
    {
        print("A MooingGrassEater can be seen as a Cow\n");
    }
    else
    {
        print("A MooingGrassEater has no hope of being recognized as a Cow\n");
    }

    if (it_quacks(new MooingGrassEater(), 'Sheep'))
    {
        print("A MooingGrassEater can be seen as a Sheep\n");
    }
    else
    {
        print("A MooingGrassEater has no hope of being recognized as a Sheep\n");
    }
?>

Missing obvious points are of course to compare the number of arguments to the methods and wether they’re optional, so that you further ensure call safety. But hey, it’s just an example implementation. Read the original linked page for more information about the concept.

Spectator PHP Debugger and an Update on WebGrind

Stumbled upon Spectator – an PHP debugger written in XUL. Seems like a promising project and I’m always in favor of people who actually do what they say other people should do :-)

Also worth noting is that the first releases of WebGrind are out, and seems to be a neat tool for those who need to make sense of a few kcachegrind files (for example if you’re using xdebug and it’s profiling functionality).