Support for Solr in eZ Components’ Search

The new release of eZ Components (2008.1) has added a new Search module, and the first implementation included is an interface for sending search requests and new documents to a Solr installation. An introduction can be found over at the eZ Components Search Tutorial. The new release of eZ Components requires at least PHP 5.2.1 (.. and if you’re not already running at least 5.2.5, it’s time to get moving. The world is moving. Fast.).


Misunderstanding How in_array Works

Brian Moon has a post about how in_array() really, really sucks. This is not a problem with in_array per se, but failing to recognize the proper way to solve a problem like this. Some of the comments has already touched on this matter, but I’ll attempt to further expand and describe the problem.

You have two arrays; a1 and b2. You’re interested in removing all the values from a1 that also are in b2. If you’re doing the naive approach (which Brian Moon describes), you’ll simply do:

foreach($a1 as $key => $value)
{
    foreach($b2 as $key2 => $value2)
    {
        if ($value == $value2)
        {
            unset($a1[$key]);
        }
    }
}

(ignore any potential side effects of manipulating $a1 while looping through it for now)

This will work for small sizes of a1 and b2, but as soon as the number of entries starts to increase (let’s call them m and n), you’ll see that the growth of your function will approach O(m*n), which can be written as O(n²) as both values approach infinity. This is not good and is the same complexity that you’ll find in naive sorting algorithms. This means that for each element you add to the array, your processing time increases quadratically (since you have two loops here). in_array is simply a shortcut for the inner loop (the inner foreach loop) in this example. It loops through each element of the array and checks if it matches the needle we’re searching for.

Compare this to using array_flip on the array to search first, so that the values becomes the keys:

foreach($a1 as $key => $value)
{
    if (isset($b2[$key]))
    {
        unset($a1[$key]);
    }
}

But why is isset($arr[$key]) any faster than using in_array? Doesn’t the application just have to loop through a different set of values instead (this time, the keys instead of the values)? No. This is where hashing comes into the picture. As $key can be any string value in PHP, the string is hashed and resolved to an internal array id. This means that internally, the following is happening:

$arr[$id] => find location by converting $id to an internal array location (on the C-level) => simply index the array by using this value

Instead of going through all the valid keys, the $id is converted once, and then checked to see if there is any value stored at that location. This is a simplification, but explains the concept. The complexity of this conversion may depend on the length of the key (depending on the choice of hashing function), but we’ll ignore this here, and simply give it a complexity of O(1). Looking up the index in the array is also an O(1) operation (it takes the same time, regardless if we’re looking at item #3 or #4818, it’s simply reading from different locations in memory).

This means that while our other loop is still looping through n elements, we’re now just doing a constant lookup in the innerloop. The amount of work done in the inner loop does not depend on the number of elements in b2, and this means that our algorithm instead grows in a linear fashion (O(n)).

Further reading can be found at Wikipedia: Hash function, Big O Notation. I’ll also suggest reading an introductionary book into the field of algorithms and datastructures. The type of book depends on your skillset, but if anyone wants any suggestions, just leave a comment and I’ll add a few as I get home to my bookshelf tonight.

Implementing a Duck Operator with Reflection

Following up on this post regarding a “duck operator” in PHP, I went ahead and wrote a very, very, very simple implementation using reflection api in php to get the same functionality.

getMethods() as $method)
        {
            $ret[$method->getName()] = true;
        }
        
        return $ret;
    }
    
    function it_quacks($object, $interface)
    {
        $reflectionClass = new ReflectionClass($interface);
        $reflectionObject = new ReflectionObject($object);
        
        $reflectionClassMethods = getMethodProperties($reflectionClass);
        $reflectionObjectMethods = getMethodProperties($reflectionObject);
        
        foreach($reflectionClassMethods as $methodName => $methodData)
        {
            if (empty($reflectionObjectMethods[$methodName]))
            {
                return false;
            }
        }
        
        return true;
    }

    if (it_quacks(new MooingGrassEater(), 'Cow'))
    {
        print("A MooingGrassEater can be seen as a Cow\n");
    }
    else
    {
        print("A MooingGrassEater has no hope of being recognized as a Cow\n");
    }

    if (it_quacks(new MooingGrassEater(), 'Sheep'))
    {
        print("A MooingGrassEater can be seen as a Sheep\n");
    }
    else
    {
        print("A MooingGrassEater has no hope of being recognized as a Sheep\n");
    }
?>

Missing obvious points are of course to compare the number of arguments to the methods and wether they’re optional, so that you further ensure call safety. But hey, it’s just an example implementation. Read the original linked page for more information about the concept.

Spectator PHP Debugger and an Update on WebGrind

Stumbled upon Spectator – an PHP debugger written in XUL. Seems like a promising project and I’m always in favor of people who actually do what they say other people should do :-)

Also worth noting is that the first releases of WebGrind are out, and seems to be a neat tool for those who need to make sense of a few kcachegrind files (for example if you’re using xdebug and it’s profiling functionality).

The Norwegian PHP TestFest is Done!

Just got back from Oslo after The Norwegian PHP TestFest. We were a total of seven people and Hannes Magnusson (hm, familiar title for that blog: We don’t either.) held an introduction to how to write .phpt tests. I had read a bit about the stuff previously and written at test or two, but I finally got some time to play around with the run-tests.php-script and other small tidbits.

I think everyone had a good time, and according to Flyspray for the PHP TestFest, we managed to write a total of 13 tests. The two tests for similar_text() and the two tests for using sqlite as a session save handler were the product of yours truly. I’ll try to contribute more in the future, I’ll just have to get gcov and a few other tidbits up and running here first.

In regards to writing phpt tests for the session module; anyone have a good idea how to force the garbage collector of a session save handler to be run? I’ve tried both using session.gc_divisor 1 and session.gc_maxlifetime 1 (and then sleep(2)), but didn’t get things working. Any suggestions would be appreciated.

Short Array Syntax for PHP

En route from the aggregated stream of Planet PHP comes a small post from Brian Moon about Stan’s suggestion for introducing the [] syntax for creating lists in PHP.

I like Python. This is like Python. By association, I like this. This would mean that you now could do:

$a = [1, 2, 3];

instead of $a = array(1,2,3); and:

$a = [‘one’ => 1, ‘two’ => 2]

instead of $a = array(‘one’ => 1, ‘two’ => 2);

Syntactial sugar, but still, sweet sugar.

If we just could get [:]-syntax for slicing and dicing too now..

The First Presentations From php|tek Online

php|tek is taking place halfway around half of the world for me, but the first presentations from the conference is beginning to appear online now. The first three presentations are from Brian DeShong and Maggie Nelson:

While Brians two presentations were mostly familiar stuff for me, Maggie’s presentation touched something that has troubled me time over and again, and that Christer and me has been looking for a good solution to. We’re currently experiencing the problems described, and I’ve been searching several times for a good tool to generate sqldiffs (and not for the _values_ of most of the tables). I’m waiting eagerly for the first release, and as soon as things are up and running, I’ll look into if there’s anything I can contribute.

Memcache Stats

Harun Yayli has a post up about a PHP application for retrieving and presenting memcache stats from several servers. The interface is built on the familiar apc.php which is included with APC (Alternative PHP Cache, a PHP bytecode / shared memory storage cache), and should be easy to navigate for anyone who have used apc.php. The stats shown in the application includes total space available, memcache usage, hits, misses, uptime and other interesting information.

Norwegian PHP TestFest

The Norwegian PHP User Group, PHPNorge, is hosting a Norwegian PHP TestFest in two weeks time, at the 29th of May. Haven’t mentioned it to Christer yet as he’s leaving for New York tomorrow morning (and then connecting to Chicago for php|tek). I’m going to the TestFest at the 29th unless something special shows up, and I ask anyone available in the area to join in to make a contribution. See you there!

PHP Norge Meeting Tomorrow (15. May 2008)

There’s a member meet in PHP Norge (the norwegian PHP user group) coming up tomorrow, but seems like I’m not going to make it this time either. Suddenly got a meeting in Fredrikstad that might run late, and Christer is leaving for New York on friday, so he’s not attending either. Too bad, but hopefully all the other guys will have a great time. We’re still on for PHP Vikinger, so I guess we’ll meet there.

All other PHP users in the area are of course encouraged to visit!