PHP Namespaces and the What The Fuck problem

October 26th, 2008

As many people has discovered during the last days, some of the developers behind PHP has finally reached a decision regarding how to do introduce namespaces into PHP. This is an issue which has been discussed on and off for the last three years (or even longer, I can’t really keep track), with probably hundreds of threads and thousand of mail list posts (read through the php.internals mail list archive if you’re up for it). The current decision is to go with \ as the namespace separator. An acceptable decision by itself, and while I personally favored a [namespace] notation, I have no reason to believe that this is not a good solution.

There does however seem to be quite a stir on the internet in regards to decision, which I now will call the “What The Fuck Problem”. Most (if not all) public reactions on blogs, reddit, slashdot and other technology oriented sites can be summed up as “What The Fuck?”. While I’m not going to dwell into the psychological reasons for such a reaction, my guess is that it’s unusual, lacks familiarity for developers in other languages already supporting namespaces and that it might be hard to understand the reasoning behind such a decision.

The problem: to retrofit namespaces into a language which were not designed to support such a construct, without breaking backwards compability. The part about not breaking backwards compability is very important here, as it leaves out everything which could result in a breakage in existing code by simply using a new PHP version.

The discussions have been long, the attempts has been several (thanks to Greg Beaver’s repeated persistence in writing different implementations along the way) and each and every single time an issue has crept in which either breaks existing functionality or results in an ambigiuity when it comes to resolving the namespace accessed. Most issues and the explaination why they are issues, are documented at Namespace Issues in the PHP Wiki. This provides some insight into why the decision to use a separate identifier was chosen.

This seemed to get through in the flamewars discussions after a while, and people instead started to point out the “gigantic flaw”:

If you were to load a class or a namespace dynamically by referencing it in a string, you’d have to take care to escape your backslash:

spl_autoload_register(array("Foo\tBar", "loader"));

.. would mean Foo<tab>Bar. Yep. It actually would. And if that’s the _biggest_ problem with the implementation of using \ as a namespace separator, then I feared its introduction earlier for no apparent reason.

The other examples has plagued us with ambiguity issues, autoloading issues, no-apparent-way-of-putting-functions-and-constants-in-the-namespace-issues and other problems. This way we are left with the task that we, as usual, have to escape \ in a string — when we want to reference a namespace in a dynamic name — and that’s the biggest problem?

I just hope that people keep it sane and don’t implement any special behaviour in regards to how strings are parsed in regards to new, $className::. This _will_ cause problems and magic issues down the road if it ever gets into the engine.

PHP is free, it’s open and it’s yours for the taking. Fork it if you want to, or provide a patch which solves the problem in a better way. The issue has been discussed to death, and so far there is no apparent better solutions on the table. If you have one, you’ve already had three years to suggest it. You better hurry if you’re going to make people realize it now.

Additional observation: most people who has an issue with this, seems to be the same people who would rather be caught dead than writing something in PHP. Yes. Python and Java does it prettier. That’s not a solution to the problem discussed in regards to PHP.

Java printf and Booleans

October 25th, 2008

The available stdout mapping in Java, System.out (PrintStream), supports printf (to format a string when outputting it) in the same manner as other languages such as C. By using System.out.printf instead of System.out.print, you can also provide a formatting string which the method can use to write the output in a custom format.

As Java supports several other native datatypes that weren’t available as regular C-types, you also have a few other options. One of these are the %b / %B identifier, which represents a boolean value.

The format string “%b” expects one argument, a boolean. This will be written as either ‘true’ or ‘false. You would (as of writing) get the same result by using the %s identifier as this would cast the boolean of a string, which in turn would give true or false, but it’s always better to be explicit about what you are doing than assuming that people understand that you understood something (which even could change..).

Keeping Your Code in Check: Part 1 - DRY

October 10th, 2008

DRY - Don’t Repeat Yourself - is a central principle for everyone who wants to keep their code maintainable and in a clean and pristine state. It will save you from tedious tasks in the future, although it requires you to do a bit more thinking up front. The clue is to recognize when you’re about to commit the first programming sin: repeating parts of code.

Lets first be completely clear: not repeating code does not mean your code should be as general as possible. Doing that will only end up in a horrible mess that your only hope of selling is to label it “Enterprise” and push all the configuration options into XML. If your way to success is to craft a new buzzword and get the consulting business to sell your product, that might be how you do it. If you’re interesting in writing maintainable and solid software that other people can play around with and get the grip of quite quickly; don’t.

Keep it specialized to do what you need it do easily, but keep it general enough to allow you to reuse it for similar tasks.

The reason why I’m posting this now is unsurprisingly enough that I ran into this issue while coding a library at work. This issue is so common that any programmer will run into it several times a day, but the issue at hand today illustrates the problem perfectly. In a class that I was writing to return a set of data to the user, I decided to add three methods for sorting the data in different manners before returning it. The problem is that the data is kept in different array keys, and it’s the arrays themselves I want to sort. Example:

['displayValue' => 'Aloha Beach', 'count' => 13, 'value' => 'aloha']
['displayValue' => 'Norway', count => 3, 'value' => 'norway']
['displayValue' => 'Sweden Swapparoo', count => 7, 'value' => 'sweden']

To sort this in PHP, you’d use the usort function with a callback function / method that sorts the arrays by the value of the right key. My first version was something like:

  1. private function sortByValue($a, $b)
  2. {
  3.     if (!isset($a['value'], $b['value']))
  4.     {
  5.         if (isset($a['value']))
  6.         {
  7.             return -1;
  8.         }
  9.  
  10.         if (isset($b['value']))
  11.         {
  12.             return 1;
  13.         }
  14.    
  15.         return 0;
  16.     }
  17.  
  18.     return strcmp($a['value'], $b['value']);
  19. }
  20.  
  21. private function sortByCount($a, $b)
  22. {
  23.     if (!isset($a['count'], $b['count']))
  24.     {
  25.         if (isset($a['count']))
  26.         {
  27.             return -1;
  28.         }
  29.  
  30.         if (isset($b['count']))
  31.         {
  32.             return 1;
  33.         }
  34.    
  35.         return 0;
  36.     }
  37.  
  38.     return ($b['count'] - $a['count']);
  39. }

As I typed the second if (isset()) in the second method, I finally realized that I were sitting at work and writing the exact same function twice, with just a little twist between them in regards to which field to sort by - and how to determine the sort. One field was numeric, the other was a string. Our goal is to re-use as much as the common code from both methods without typing it up — or much more important, maintaining — it twice.

As you can see, almost the whole function is identicial, except for the key name (’value’ vs ‘count’) and how the comparison is done (numeric vs string). We need to handle these two issues to be able to use the same function for both purposes, so we rework the two functions into one function for general use for sorting on the value of a key:

  1. private function sortByArrayField($a, $b, $field)
  2. {
  3.     if (!isset($a[$field], $b[$field]))
  4.     {
  5.         if (isset($a[$field]))
  6.         {
  7.             return -1;
  8.         }
  9.  
  10.         if (isset($b[$field]))
  11.         {
  12.             return 1;
  13.         }
  14.    
  15.         return 0;
  16.     }
  17.  
  18.     if (is_numeric($a[$field]) && is_numeric($b[$field]))
  19.     {
  20.         return ($b[$field] - $a[$field]);
  21.     }
  22.  
  23.     return strcmp($a[$field], $b[$field]);
  24. }

This way we handle both the comparison method (if both values are numeric, we do the comparison as a numeric value by simply subtracting the first from the second) and the field to sort (as a third parameter).

Sadly usort does not provide that third parameter for us automagically, but by creating a few simple helper methods for using our refactored function, we get all “configuration” set in those three methods. The code base only contains one implementation of the method itself, but several ways to use it. Example of these three helper methods:

  1. private function sortByValue($a, $b)
  2. {
  3.     return $this->sortByArrayField($a, $b, 'value');
  4. }
  5.  
  6. private function sortByCount($a, $b)
  7. {
  8.     return $this->sortByArrayField($a, $b, 'count');
  9. }
  10.  
  11. private function sortByDisplayValue($a, $b)
  12. {
  13.     return $this->sortByArrayField($a, $b, 'displayValue');
  14. }

If someone now comes along and decides to add another column to our array, such as ‘price’, we can simply add another sorting callback to accomodate this:

  1. private function sortByPrice($a, $b)
  2. {
  3.     return $this->sortByArrayField($a, $b, 'price');
  4. }

The sortByArrayField method is already well proven and tried from our previous usage, and by simply changing the field we’re sorting by, we still get the power of the callback, get a completely new sort criteria and the maintainability of just one method.

The First Rule of Debugging

October 8th, 2008

Seems like there are quite a lot of ideas about what the first rule of debugging are, and a quick Google Search gives you insightful suggestions such as:

These are all valuable insights that provide clear value about what and how you could attack the problem of the Bug That Wasn’t Supposed To Be There (there are surely bugs that actually were supposed to be there, because of invalid domain specifications, etc.). I did however find one small column from DDJ that led into the same thing I’m going to write now: “Before you go to fix it, be sure you’re fixing the right thing.

I will assume that you already have uncovered that you actually have a bug — you’re not getting the results you expected, and something is to blame. You start out by trying to isolate the problem, and try to recreate the broken situation with different inputs. You try it on another system to see if there’s something about the configuration, about the dataset, about the versions of your library or something, which we’ll disover later, is completely irrelevant.

I obviously forgot this rule at work today, and that’s why I’m writing this post now. Remember kids. Always make sure you’re fixing the right thing! I’ve started porting one of the front end services we use to federate searches from a borked Java library to a PHP-based implementation instead. This involves talking to an internal search server that I hadn’t written code to interface with before, other than the existing search service. The implementation went from zero to usable in almost no time, and I now have code that provides a nice foundation for the remaining work. The problem is that I obviously decided to fix another issue with the search server that I had looked into earlier, and it seems that I thought that since I now had new and improved experience with the service, I’d be better off this time.

So I set out trying to get a new sorting scheme working for the search results. At first I failed, but then I stumbled across some documentation that threw me in the right direction. I did a few minor changes, and the sorting changed. Woohooo! Well, it almost worked. The order was not quite as expected, but the results seemed to be close enough for the proximity search to be working. It just had to be something in the documentation or implementation that I had missed! Further digging and experimentation for the greater part of an hour or so left me blank, but I managed to get another search ordering which made me think that there was something peculiar about my input parameters that created the issue.

After almost two and a half hour of this, I happened to read four lines that were neatly tucked away in a “process” part of the resulting XML document from the search service. In the diffuse light glooming from my Dell monitor, I could read the now obvious words: “<subsystem> No license.”.

Yep. It never had anything to do with the actual result. It was never even active. The change of the sorting must have happened for some other reason (such as removing the default values or anything like that). I weren’t supposed to debug why the sorting was incorrect, I was supposed to find out why the subsystem didn’t load.

Grrr.

Mats’ (borrowed) first rule of debugging: Make sure you debug the right problem. Do not make assumptions.

(and a rule of documentation: if the feature may not be available in the installation, write how you can check this before actually giving examples and detailing how to use the feature)

Informa and Custom XML Namespaces in RSS

September 25th, 2008

While integrating a custom search application into a Java-based web application, I came across the need to access properties in custom namespaces through the Informa RSS library. Or to put it in another way; i needed to access to properties, Informa had been used for RSS parsing in the previous versions of the web application. The people who developed the original version of the application had decided to extend the Informa library into their own version, and had added several methods for .get<NameOfCustomProperty> etc. After thinking about this for approximately 2 seconds, I decided that having to support and modify a custom version of Informa was not the right track for us.

My initial thought was that their decision to customize Informa to support these methods had to come from the idea that Informa did not support custom namespaces out of the box. I did a few searchas over at Google, and found nothing useful. Reading through the documentation for Informa didn’t do me any good either, so I tried to find an alternative library instead. Did a bit of searching here too, and stumbled across a hit for one of the util classes for Informa (.. again). This did support custom namespaces, so the backend support was there at least. Then it struck me while reading the documentation for Informa and ChannelIF again; Informa did support it, as it inherited the methods from further up in the hierarchy. The getElementValue and getElementValues methods of the ChannelIF and ItemIF classes allows you to fetch the contents of elements with custom namespaces in a very easy to like manner.

  1. System.out.println(item.getElementValue("exampleNS:field"));

This simply returns the string contained between <exampleNS:field> and </exampleNS:field>

Hoooray! We now have support for these additional fields, and we do not have to keep Informa manually in sync with the version in our application. Why the original developers decided to fork the Informa library to add their own properties I may never know, but I’ll update this post if they decide to step forward!

Unit-Testing Code Which Uses a Database

August 20th, 2008

How to unit-test code that interacts with a database appeared on the blog of Baron Schwartz, and to be really boring, I agree with what he’s writing. Unit-testing database connectivity and storage is not hard. If it is, it might be a good time to redo that architecture you’ve been talking about.

An important point that Baron mentions is that you _NEVER_ _EVER_ run your tests on your production servers. That will of course be disastrous, as your tests needs a predefined state of the database to be valid for testing. The solution I’ve been using to handle this, is to always set up my environment to use another database when doing the tests. This way, you’ll never end up with running the tests on a live database by accident. I handle this in my AllTests.php file, where the test suites and shared fixtures are set up. We dump the contents of the developer database (databasename), create a new database (databasename_test) and insert all the current table structures and indexes. This way we get an accurate copy of the table definitions currently defined by the developer (so that we don’t run the tests against an old set of tables), and we test that the code works as it should with the active definitions.

The simplest way to do this, is to use mysqldump and mysql through a call to exec. If you’re not in a trusted environment, please, please, please add the appropriate shell argument escape commands. It can however be argued that if you’re allowing random people to change your database login information, you probably have bigger problems than doing unit testing..

  1. exec('mysqldump -u ' . $username . ' -p' . $password . ' ' . $dbname . ' | mysql -u ' . $username . ' -p' . $password . ' ' . $database . '_test');

It would be very interesting to get more information about which measures Baron advocates for detecting a production system. We have configuration settings for our applications which also defines if this is a development or production system, in addition to the fact that our testing code only touches databases which end in _test.

The Results of our Weekend Challenge

August 13th, 2008

The weekend PHP size competiton I mentioned on friday has come to an end, with the results being as follows:

      CNU (253 bytes)
      Helge (261 bytes)
      Ymgve (267 bytes)
      Me (365 bytes)
      dibon&zep (which never delivered, but had working solutions)

For those who are interested in the strategies and tactics employed by the contestants, I’ve included a small write-up and analysis of the various contributions. We decided early on that using eval() and gzinflate on the content were something that everyone could apply, so the size would be counted for the decompressed code. If any contestant implemented their own decompressor in the user space code themselves, we would accept that. It would not be any advantage anyways, as the decompressor code would eat up more space than the best contributions used in total.

I’ve included the writers own comments if they had any. The biggest change in size happened when people started using one dimensional strings instead of arrays.

CNU’s contribution

  1. $b='';foreach(file('maps.txt',2)as$l){if(!$l){$q[1]=strpos($b,'.');for($b=str_split($b);$p=&$q[++$l];)foreach(array($p-$w,$p+$w,$p-1,$p+1)as$x){$c=$b[$p]+1;$d=&$b[$x];if($d=='X')echo$c.'
  2. ';if($d==' ')$q[]=$x;$d=$c;}$q=$b='';}$b.=$l;$w=strlen($l);}

Helge’s contribution

  1. $m="";foreach(file("maps.txt")as$l){if($l!="
  2. "&&($m.=$l)+$w=strlen($l))continue;for($y=$d[$v[]=strpos($m,'.')]=0;strpos($m,'X')-$u=$v[$y++];)for($x=-$w;$x<=$w;$x+=$x+1?$w-1:2)in_array($n=$u+$x,$v)|$m[$n]=="#"||$d[$v[]=$n]=$d[$u]+1;echo"$d[$u]
  3. ";$m=$v="";}

Helge also included an annotated version with comments in norwegian:

  1. /**
  2.  * $l = én linje (av maps.txt)
  3.  * $m = ett map
  4.  * $w = vidden til et kart
  5.  * $d = distansen til et vilkårlig punkt
  6.  * $v = open/closed-array (putter både squares
  7.         of interest og visited squares her)
  8.  * $u = "current square", utgangspunktet til å
  9.         generere naboer
  10.  * $n = nabo
  11.  * $x = hjelpevariabel for å generere $n
  12.  * $y = hjelpevariabel for å generere $u
  13.  */
  14.  
  15.  
  16. /* Klarte aldri å finne en måte å skippe initialisering av
  17.    $m her.. (den resettes på siste linja og.. føles waste)
  18.    * En idé var å bruke to for(;;)'s istedenfor en foreach
  19.    og continue, men jeg tror jeg hadde økt i kode da.. */
  20. $m="";
  21.  
  22. /* Foreach var nyttig fant jeg ut.. den stopper når filen er
  23.    ferdig lest av seg selv, så slipper kode på sjekking av det,
  24.    $f[$y++], osv. */
  25. foreach(file("maps.txt") as $l) {
  26.     if(
  27.         /* Når $l=="\n" (som betyr slutten på et kart) skal
  28.            vi slutte å appende til $m og heller breake if'en
  29.            så vi kan starte pathfinding */
  30.         $l != "\n" &&
  31.  
  32.         /* Bruker en 1-dimensjonal string istedenfor array
  33.            (= win!) */
  34.         ($m .= $l) +
  35.  
  36.         /* Trenger vidden når jeg skal regne ut naboene til et
  37.            gitt felt. */
  38.         $w = strlen($l)
  39.     )
  40.         /* Gjør så $l blir feedet en ny linje av kartet vårt til
  41.            kartet er ferdig */
  42.         continue;
  43.  
  44.     /* Pathfinding start! */
  45.     for(
  46.         /* Setter $y og $d[starten] lik 0, og putter startposisjon
  47.            inn i $v i samma slengen. */
  48.         $y = $d[ $v[] = strpos($m, '.') ] = 0;
  49.  
  50.         /**
  51.          * Gjør så for(;;) breaker når sluttposisjon-$u blir 0
  52.          * (synonymt med $u == strpos($m, 'X');
  53.          * Tar også neste $u fra $v i samma slengen */
  54.         strpos($m, 'X') - $u = $v[ $y++ ];
  55.        
  56.     )
  57.         /* For hver nabo */
  58.         for(
  59.             /* Genererer naboer.
  60.              * Den starter på -width (som blir opp), går så til -1
  61.                (som blir venstre), 1 (som blir høyre) og tilslutt
  62.                width (som blir ned) */
  63.             $x = -$w;
  64.             $x <= $w;
  65.             $x += $x+1 ? $w-1 : 2
  66.         )
  67.             /* Om naboen (summen av $u og $x) er i $v: continue */
  68.             in_array($n = $u+$x, $v) | /* fancy bitwise or som kan
  69.                                           brukes fordi det er snakk
  70.                                           om 2 booleans */
  71.  
  72.             /* Om naboen er en '#' på kartet: continue */
  73.             $m[$n] == "#" ||
  74.  
  75.             /* Hvis ikke, sett distansen til naboen lik distansen
  76.                til $u + 1 og putt naboen inn i $v */
  77.             $d[ $v[] = $n ] = $d[$u]+1;
  78.  
  79.     /* Skriv ut distansen til $u (skal være sluttpunktet nå) */
  80.     echo "$d[$u]\n";
  81.  
  82.     /* Disse må resettes.. */
  83.     $m=$v="";
  84. }

Ymgve’s contribution

  1. foreach(file("maps.txt")as$l){for($i=0;-75<$e=ord($l[$i++])-88;$m[]=$e+56?$e:1e6,$z=array(-1,1,-$i,$i))$e+42||$q[0]=count($m);if($i<3){for($d=$k=$c=0;;$c++-$k||$i++&$k=$d)for($j=4;$j;$g>$i&&($g=$i)&$q[++$d]=$p)if(!$g=&$m[$p=$q[$c]+$z[$j]])break 2;echo"$i
  2. ";}}

My own contribution

Breadth-first Search, Delivered Version, 365 bytes:

  1. <?php $y=0;foreach(file('maps.txt')as$l)if("
  2. "==$l){for(;$q&&list($y,$x,$d)=array_shift($q);)if(!empty($m[$y][$x])){$k=&$m[$y][$x];echo$k=='X'?"$d
  3. ":'';$k=0;$q[]=array($y,$x-1,++$d);$q[]=array($y-1,$x,$d);$q[]=array($y,$x+1,$d);$q[]=array($y+1,$x,$d);}$m=$y=null;}else{for($x=0;$l[$x++]!="
  4. ";$m[$y][$x]=$l[$x]=="#"?0:$l[$x])if($l[$x]=='.')$q[]=array($y,$x,0);$y++;}

Prettyprinted:

  1. <?php
  2. $y=0;
  3. foreach(file('maps.txt')as$l)
  4. if("
  5. "==$l)
  6. {
  7.     for(;$q&&list($y,$x,$d)=array_shift($q);)
  8.         if(!empty($m[$y][$x]))
  9.         {
  10.             $k=&$m[$y][$x];
  11.             echo$k=='X'?"$d
  12. ":'';
  13.             $k=0;
  14.             $q[]=array($y,$x-1,++$d);
  15.             $q[]=array($y-1,$x,$d);
  16.             $q[]=array($y,$x+1,$d);
  17.             $q[]=array($y+1,$x,$d);
  18.         }
  19.        
  20.         $m=$y=null;
  21. }
  22. else
  23. {
  24.     for($x=0;$l[$x++]!="
  25. ";$m[$y][$x]=$l[$x]=="#"?0:$l[$x])
  26.         if($l[$x]=='.')$q[]=array($y,$x,0);
  27.     $y++;
  28. }

This is mostly your usual run-of-the-mill BFS, where the search itself is implemented as a simple queue.

We loop through all the lines in the file (observe that you can remove the space character on both sides of as, saving you two bytes!):

  1. foreach(file('maps.txt')as$l)

The part within the else condition builds the map to search for the exit point:

  1.     // for each line in the file, we do this
  2.     for($x=0;$l[$x++]!="
  3. ";$m[$y][$x]=$l[$x]=="#"?0:$l[$x])
  4.         if($l[$x]=='.')$q[]=array($y,$x,0);

The for() loops through each character on the line, until it hits the newline marker at the end. Instead of entering “\n”, you simply use the actual newline. This way you save another byte. The map ($m) is then populated using the current y,x coordinate (y being the line, x being the character on that line).

$m[$y][$x]=$l[$x]==”#”?0:$l[$x] can be expanded to:

  1. if ($l[$x] == "#")
  2. {
  3.     $m[$y][$x] = 0;
  4. }
  5. else
  6. {
  7.     $m[$y][$x] = $l[$x];
  8. }

Meaning that if the character is a wall, we store the value zero; if any other value, we store the character itself. We’ll use this attribute when searching to determine when we’ve reached our goal. All of this is kept inside the for()-construct itself. The only thing contained in the for()-loop, is the statement if($l[$x]==’.')$q[]=array($y,$x,0);. This checks if the current character is the starting point, and if that’s the case, adds it to our queue of points to check (making it the starting point of our BFS).

When we hit an empty line in the file (through the if(”
“==$l) check), we execute our BFS.

  1.     /* while $q is valid (this handles empty labyrinths), we fetch the
  2.         current y and x coordinates, in addition to d, the distance to
  3.         the location we're fetching from the queue. No {} here, as the
  4.         for only contains the if() statement below. */
  5.     for(;$q&&list($y,$x,$d)=array_shift($q);)
  6.  
  7.         /* if this spot has a value (remember that we assigned 0 to walls,
  8.             while we kept the character value for other fields when we
  9.             parsed the file. We use empty() as the coordinates may not be
  10.             valid indexes into the array, as we don't do any checks when
  11.             we add them. one call to empty() is better than four ifs and
  12.             duplicate array references). */
  13.         if(!empty($m[$y][$x]))
  14.         {
  15.             // create a reference (saves one byte instead of referencing $m[$y][$x] twice.
  16.             $k=&$m[$y][$x];
  17.  
  18.             /* if we've found the X, echo the distance to the X. Under any
  19.                 under circumstance, you'd break; out of the for/while here,
  20.                 but as bytes count more than speed, we save six bytes by
  21.                 searching the whole labyrinth instead of ending early. Funny
  22.                 detail: as echo is a language construct, you can skip the space.
  23.                 If the character is not 'X', we simply echo an empty string
  24.                 instead of using a conditional echo. Saves a couple of bytes. */
  25.             echo$k=='X'?"$d
  26. ":'';
  27.  
  28.             // unset the value, so that we don't visit this node again.
  29.             $k=0;
  30.  
  31.             /* add all neighbours to our queue. We pre-increment $d, instead
  32.                 of adding +1 in all the adds. */
  33.             $q[]=array($y,$x-1,++$d);
  34.             $q[]=array($y-1,$x,$d);
  35.             $q[]=array($y,$x+1,$d);
  36.             $q[]=array($y+1,$x,$d);
  37.         }
  38.        
  39.         /* Reset the values. PHP generates a warning if a value has never
  40.             been set before when used, but will happily accept null values
  41.             as empty arrays or 0. The double assignment is also a trick worth
  42.             remembering to save bytes. */
  43.         $m=$y=null;

Some of tricks used in the competition

  • Multi-variable assignment: $a=$b=0;
  • Actual newline when detecting a newline: if ($a==”
    “){}
  • Using null to be able to reset both integers and arrays in one statement.
  • The more code you can fit into a for() statement, the better. You get three “free” line endings by doing this. for(;;) is the same length as while(), but you can get several statements evaluated.
  • You can skip the spaces in foreach().
  • Create references to avoid using long array indices several times.
  • The ternary operator (?:) is your friend!
  • Remember that {}’s can be dropped when you have a single statement that contains all subsequent code (as long as you don’t get any conflicts with else etc. on different levels):
    for(;;) <– no { here.
    if ()
    {

    }

  • Simulating arrays with strings

The scripts were run with a set of pre-made labyrinths and a collection of 20 random labyrinths (more like maps, but they still test the code the same). The script for creating the labyrinths is available here. All delivered contributions passed all tests.

Misunderstanding How in_array Works

June 5th, 2008

Brian Moon has a post about how in_array() really, really sucks. This is not a problem with in_array per se, but failing to recognize the proper way to solve a problem like this. Some of the comments has already touched on this matter, but I’ll attempt to further expand and describe the problem.

You have two arrays; a1 and b2. You’re interested in removing all the values from a1 that also are in b2. If you’re doing the naive approach (which Brian Moon describes), you’ll simply do:

  1. foreach($a1 as $key => $value)
  2. {
  3.     foreach($b2 as $key2 => $value2)
  4.     {
  5.         if ($value == $value2)
  6.         {
  7.             unset($a1[$key]);
  8.         }
  9.     }
  10. }

(ignore any potential side effects of manipulating $a1 while looping through it for now)

This will work for small sizes of a1 and b2, but as soon as the number of entries starts to increase (let’s call them m and n), you’ll see that the growth of your function will approach O(m*n), which can be written as O(n²) as both values approach infinity. This is not good and is the same complexity that you’ll find in naive sorting algorithms. This means that for each element you add to the array, your processing time increases quadratically (since you have two loops here). in_array is simply a shortcut for the inner loop (the inner foreach loop) in this example. It loops through each element of the array and checks if it matches the needle we’re searching for.

Compare this to using array_flip on the array to search first, so that the values becomes the keys:

  1. foreach($a1 as $key => $value)
  2. {
  3.     if (isset($b2[$key]))
  4.     {
  5.         unset($a1[$key]);
  6.     }
  7. }

But why is isset($arr[$key]) any faster than using in_array? Doesn’t the application just have to loop through a different set of values instead (this time, the keys instead of the values)? No. This is where hashing comes into the picture. As $key can be any string value in PHP, the string is hashed and resolved to an internal array id. This means that internally, the following is happening:

$arr[$id] => find location by converting $id to an internal array location (on the C-level) => simply index the array by using this value

Instead of going through all the valid keys, the $id is converted once, and then checked to see if there is any value stored at that location. This is a simplification, but explains the concept. The complexity of this conversion may depend on the length of the key (depending on the choice of hashing function), but we’ll ignore this here, and simply give it a complexity of O(1). Looking up the index in the array is also an O(1) operation (it takes the same time, regardless if we’re looking at item #3 or #4818, it’s simply reading from different locations in memory).

This means that while our other loop is still looping through n elements, we’re now just doing a constant lookup in the innerloop. The amount of work done in the inner loop does not depend on the number of elements in b2, and this means that our algorithm instead grows in a linear fashion (O(n)).

Further reading can be found at Wikipedia: Hash function, Big O Notation. I’ll also suggest reading an introductionary book into the field of algorithms and datastructures. The type of book depends on your skillset, but if anyone wants any suggestions, just leave a comment and I’ll add a few as I get home to my bookshelf tonight.

Implementing a Duck Operator with Reflection

June 5th, 2008

Following up on this post regarding a “duck operator” in PHP, I went ahead and wrote a very, very, very simple implementation using reflection api in php to get the same functionality.

  1. <?php
  2.     interface Cow
  3.     {
  4.         function moo();
  5.         function eatGrass();
  6.     }
  7.  
  8.     interface Sheep
  9.     {
  10.         function baeaeae();
  11.         function eatGrass();
  12.     }
  13.    
  14.    
  15.     class MooingGrassEater {
  16.         function moo() { /* i go moo */ }
  17.         function eatGrass() { /* mums mums mums */ }
  18.     }
  19.    
  20.     function getMethodProperties($obj)
  21.     {
  22.         $ret = array();
  23.        
  24.         foreach($obj->getMethods() as $method)
  25.         {
  26.             $ret[$method->getName()] = true;
  27.         }
  28.        
  29.