Programming – Page 3

The Results of our Weekend Challenge

The weekend PHP size competiton I mentioned on friday has come to an end, with the results being as follows:

CNU (253 bytes)

Helge (261 bytes)

Ymgve (267 bytes)

Me (365 bytes)

dibon&zep (which never delivered, but had working solutions)

For those who are interested in the strategies and tactics employed by the contestants, I’ve included a small write-up and analysis of the various contributions. We decided early on that using eval() and gzinflate on the content were something that everyone could apply, so the size would be counted for the decompressed code. If any contestant implemented their own decompressor in the user space code themselves, we would accept that. It would not be any advantage anyways, as the decompressor code would eat up more space than the best contributions used in total.

I’ve included the writers own comments if they had any. The biggest change in size happened when people started using one dimensional strings instead of arrays.

CNU’s contribution

$b='';foreach(file('maps.txt',2)as$l){if(!$l){$q[1]=strpos($b,'.');for($b=str_split($b);$p=&$q[++$l];)foreach(array($p-$w,$p+$w,$p-1,$p+1)as$x){$c=$b[$p]+1;$d=&$b[$x];if($d=='X')echo$c.'
';if($d==' ')$q[]=$x;$d=$c;}$q=$b='';}$b.=$l;$w=strlen($l);}

Helge’s contribution

$m="";foreach(file("maps.txt")as$l){if($l!="
"&&($m.=$l)+$w=strlen($l))continue;for($y=$d[$v[]=strpos($m,'.')]=0;strpos($m,'X')-$u=$v[$y++];)for($x=-$w;$x<=$w;$x+=$x+1?$w-1:2)in_array($n=$u+$x,$v)|$m[$n]=="#"||$d[$v[]=$n]=$d[$u]+1;echo"$d[$u]
";$m=$v="";}

Helge also included an annotated version with comments in norwegian:

/** 
 * $l = én linje (av maps.txt)
 * $m = ett map
 * $w = vidden til et kart
 * $d = distansen til et vilkårlig punkt
 * $v = open/closed-array (putter både squares 
        of interest og visited squares her)
 * $u = "current square", utgangspunktet til å 
        generere naboer
 * $n = nabo
 * $x = hjelpevariabel for å generere $n
 * $y = hjelpevariabel for å generere $u
 */


/* Klarte aldri å finne en måte å skippe initialisering av 
   $m her.. (den resettes på siste linja og.. føles waste)
   * En idé var å bruke to for(;;)'s istedenfor en foreach 
   og continue, men jeg tror jeg hadde økt i kode da.. */
$m="";

/* Foreach var nyttig fant jeg ut.. den stopper når filen er 
   ferdig lest av seg selv, så slipper kode på sjekking av det, 
   $f[$y++], osv. */
foreach(file("maps.txt") as $l) {
    if(
        /* Når $l=="\n" (som betyr slutten på et kart) skal 
           vi slutte å appende til $m og heller breake if'en 
           så vi kan starte pathfinding */
        $l != "\n" &&

        /* Bruker en 1-dimensjonal string istedenfor array 
           (= win!) */
        ($m .= $l) +

        /* Trenger vidden når jeg skal regne ut naboene til et 
           gitt felt. */
        $w = strlen($l)
    )
        /* Gjør så $l blir feedet en ny linje av kartet vårt til 
           kartet er ferdig */
        continue;

    /* Pathfinding start! */
    for(
        /* Setter $y og $d[starten] lik 0, og putter startposisjon 
           inn i $v i samma slengen. */
        $y = $d[ $v[] = strpos($m, '.') ] = 0;

        /**
         * Gjør så for(;;) breaker når sluttposisjon-$u blir 0 
         * (synonymt med $u == strpos($m, 'X');
         * Tar også neste $u fra $v i samma slengen */
        strpos($m, 'X') - $u = $v[ $y++ ];
        
    )
        /* For hver nabo */
        for(
            /* Genererer naboer.
             * Den starter på -width (som blir opp), går så til -1 
               (som blir venstre), 1 (som blir høyre) og tilslutt 
               width (som blir ned) */
            $x = -$w;
            $x <= $w;
            $x += $x+1 ? $w-1 : 2
        )
            /* Om naboen (summen av $u og $x) er i $v: continue */
            in_array($n = $u+$x, $v) | /* fancy bitwise or som kan 
                                          brukes fordi det er snakk 
                                          om 2 booleans */

            /* Om naboen er en '#' på kartet: continue */
            $m[$n] == "#" ||

            /* Hvis ikke, sett distansen til naboen lik distansen 
               til $u + 1 og putt naboen inn i $v */
            $d[ $v[] = $n ] = $d[$u]+1;

    /* Skriv ut distansen til $u (skal være sluttpunktet nå) */
    echo "$d[$u]\n";

    /* Disse må resettes.. */
    $m=$v="";
}

Ymgve's contribution

foreach(file("maps.txt")as$l){for($i=0;-75<$e=ord($l[$i++])-88;$m[]=$e+56?$e:1e6,$z=array(-1,1,-$i,$i))$e+42||$q[0]=count($m);if($i<3){for($d=$k=$c=0;;$c++-$k||$i++&$k=$d)for($j=4;$j;$g>$i&&($g=$i)&$q[++$d]=$p)if(!$g=&$m[$p=$q[$c]+$z[--$j]])break 2;echo"$i
";}}

My own contribution

Breadth-first Search, Delivered Version, 365 bytes:


Prettyprinted:

This is mostly your usual run-of-the-mill BFS, where the search itself is implemented as a simple queue. 
We loop through all the lines in the file (observe that you can remove the space character on both sides of as, saving you two bytes!):
foreach(file('maps.txt')as$l)

The part within the else condition builds the map to search for the exit point:
    // for each line in the file, we do this
    for($x=0;$l[$x++]!="
";$m[$y][$x]=$l[$x]=="#"?0:$l[$x])
        if($l[$x]=='.')$q[]=array($y,$x,0);

The for() loops through each character on the line, until it hits the newline marker at the end. Instead of entering "\n", you simply use the actual newline. This way you save another byte. The map ($m) is then populated using the current y,x coordinate (y being the line, x being the character on that line).
$m[$y][$x]=$l[$x]=="#"?0:$l[$x] can be expanded to:
if ($l[$x] == "#")
{
    $m[$y][$x] = 0;
}
else
{
    $m[$y][$x] = $l[$x];
}

Meaning that if the character is a wall, we store the value zero; if any other value, we store the character itself. We'll use this attribute when searching to determine when we've reached our goal. All of this is kept inside the for()-construct itself. The only thing contained in the for()-loop, is the statement if($l[$x]=='.')$q[]=array($y,$x,0);. This checks if the current character is the starting point, and if that's the case, adds it to our queue of points to check (making it the starting point of our BFS).
When we hit an empty line in the file (through the if("

"==$l) check), we execute our BFS.
    /* while $q is valid (this handles empty labyrinths), we fetch the 
        current y and x coordinates, in addition to d, the distance to 
        the location we're fetching from the queue. No {} here, as the 
        for only contains the if() statement below. */
    for(;$q&&list($y,$x,$d)=array_shift($q);)

        /* if this spot has a value (remember that we assigned 0 to walls, 
            while we kept the character value for other fields when we 
            parsed the file. We use empty() as the coordinates may not be 
            valid indexes into the array, as we don't do any checks when 
            we add them. one call to empty() is better than four ifs and 
            duplicate array references). */
        if(!empty($m[$y][$x]))
        {
            // create a reference (saves one byte instead of referencing $m[$y][$x] twice.
            $k=&$m[$y][$x];

            /* if we've found the X, echo the distance to the X. Under any 
                under circumstance, you'd break; out of the for/while here, 
                but as bytes count more than speed, we save six bytes by 
                searching the whole labyrinth instead of ending early. Funny 
                detail: as echo is a language construct, you can skip the space. 
                If the character is not 'X', we simply echo an empty string 
                instead of using a conditional echo. Saves a couple of bytes. */
            echo$k=='X'?"$d
":'';

            // unset the value, so that we don't visit this node again.
            $k=0;

            /* add all neighbours to our queue. We pre-increment $d, instead 
                of adding +1 in all the adds. */
            $q[]=array($y,$x-1,++$d);
            $q[]=array($y-1,$x,$d);
            $q[]=array($y,$x+1,$d);
            $q[]=array($y+1,$x,$d);
        }
        
        /* Reset the values. PHP generates a warning if a value has never
            been set before when used, but will happily accept null values 
            as empty arrays or 0. The double assignment is also a trick worth 
            remembering to save bytes. */
        $m=$y=null;

Some of tricks used in the competition

Multi-variable assignment: $a=$b=0;
Actual newline when detecting a newline: if ($a=="

"){}
Using null to be able to reset both integers and arrays in one statement.
The more code you can fit into a for() statement, the better. You get three "free" line endings by doing this. for(;;) is the same length as while(), but you can get several statements evaluated.
You can skip the spaces in foreach().
Create references to avoid using long array indices several times.
The ternary operator (?:) is your friend!
Remember that {}'s can be dropped when you have a single statement that contains all subsequent code (as long as you don't get any conflicts with else etc. on different levels):

for(;;) <-- no { here.

if ()

{
}
Simulating arrays with strings

The scripts were run with a set of pre-made labyrinths and a collection of 20 random labyrinths (more like maps, but they still test the code the same). The script for creating the labyrinths is available here. All delivered contributions passed all tests.

Misunderstanding How in_array Works

Brian Moon has a post about how in_array() really, really sucks. This is not a problem with in_array per se, but failing to recognize the proper way to solve a problem like this. Some of the comments has already touched on this matter, but I’ll attempt to further expand and describe the problem.

You have two arrays; a1 and b2. You’re interested in removing all the values from a1 that also are in b2. If you’re doing the naive approach (which Brian Moon describes), you’ll simply do:

foreach($a1 as $key => $value)
{
    foreach($b2 as $key2 => $value2)
    {
        if ($value == $value2)
        {
            unset($a1[$key]);
        }
    }
}

(ignore any potential side effects of manipulating $a1 while looping through it for now)

This will work for small sizes of a1 and b2, but as soon as the number of entries starts to increase (let’s call them m and n), you’ll see that the growth of your function will approach O(m*n), which can be written as O(n²) as both values approach infinity. This is not good and is the same complexity that you’ll find in naive sorting algorithms. This means that for each element you add to the array, your processing time increases quadratically (since you have two loops here). in_array is simply a shortcut for the inner loop (the inner foreach loop) in this example. It loops through each element of the array and checks if it matches the needle we’re searching for.

Compare this to using array_flip on the array to search first, so that the values becomes the keys:

foreach($a1 as $key => $value)
{
    if (isset($b2[$key]))
    {
        unset($a1[$key]);
    }
}

But why is isset($arr[$key]) any faster than using in_array? Doesn’t the application just have to loop through a different set of values instead (this time, the keys instead of the values)? No. This is where hashing comes into the picture. As $key can be any string value in PHP, the string is hashed and resolved to an internal array id. This means that internally, the following is happening:

$arr[$id] => find location by converting $id to an internal array location (on the C-level) => simply index the array by using this value

Instead of going through all the valid keys, the $id is converted once, and then checked to see if there is any value stored at that location. This is a simplification, but explains the concept. The complexity of this conversion may depend on the length of the key (depending on the choice of hashing function), but we’ll ignore this here, and simply give it a complexity of O(1). Looking up the index in the array is also an O(1) operation (it takes the same time, regardless if we’re looking at item #3 or #4818, it’s simply reading from different locations in memory).

This means that while our other loop is still looping through n elements, we’re now just doing a constant lookup in the innerloop. The amount of work done in the inner loop does not depend on the number of elements in b2, and this means that our algorithm instead grows in a linear fashion (O(n)).

Further reading can be found at Wikipedia: Hash function, Big O Notation. I’ll also suggest reading an introductionary book into the field of algorithms and datastructures. The type of book depends on your skillset, but if anyone wants any suggestions, just leave a comment and I’ll add a few as I get home to my bookshelf tonight.

Implementing a Duck Operator with Reflection

Following up on this post regarding a “duck operator” in PHP, I went ahead and wrote a very, very, very simple implementation using reflection api in php to get the same functionality.

getMethods() as $method)
        {
            $ret[$method->getName()] = true;
        }
        
        return $ret;
    }
    
    function it_quacks($object, $interface)
    {
        $reflectionClass = new ReflectionClass($interface);
        $reflectionObject = new ReflectionObject($object);
        
        $reflectionClassMethods = getMethodProperties($reflectionClass);
        $reflectionObjectMethods = getMethodProperties($reflectionObject);
        
        foreach($reflectionClassMethods as $methodName => $methodData)
        {
            if (empty($reflectionObjectMethods[$methodName]))
            {
                return false;
            }
        }
        
        return true;
    }

    if (it_quacks(new MooingGrassEater(), 'Cow'))
    {
        print("A MooingGrassEater can be seen as a Cow\n");
    }
    else
    {
        print("A MooingGrassEater has no hope of being recognized as a Cow\n");
    }

    if (it_quacks(new MooingGrassEater(), 'Sheep'))
    {
        print("A MooingGrassEater can be seen as a Sheep\n");
    }
    else
    {
        print("A MooingGrassEater has no hope of being recognized as a Sheep\n");
    }
?>

Missing obvious points are of course to compare the number of arguments to the methods and wether they’re optional, so that you further ensure call safety. But hey, it’s just an example implementation. Read the original linked page for more information about the concept.

Short Array Syntax for PHP

En route from the aggregated stream of Planet PHP comes a small post from Brian Moon about Stan’s suggestion for introducing the [] syntax for creating lists in PHP.

I like Python. This is like Python. By association, I like this. This would mean that you now could do:

$a = [1, 2, 3];

instead of $a = array(1,2,3); and:

$a = [‘one’ => 1, ‘two’ => 2]

instead of $a = array(‘one’ => 1, ‘two’ => 2);

Syntactial sugar, but still, sweet sugar.

If we just could get [:]-syntax for slicing and dicing too now..

Swoooosh – Free Open Source Flash-based Multi File Uploader

As I’ve mentioned a few times before I’ve been playing around with Adobe Flex. I finally got some more time to play with it tonight, so I got everything together to a semi-usable shape. A few things are still missing, such as moving the active uploads to the top and handling more than a total number of x queued uploads (at a certain level the progress bars will just disappear out of the Flash area, then magically appear as enough other items are finished).

Download Swoooosh.tar.bz2!

I’m looking for any response on this, and if anyone want to play around with it, please go ahead. It should be fairly simple to set up. I’ve included a brief description of the arguments it accepts below. Everything is released under a slightly modified MIT-based license, where the only change is that I’ve removed the need for keeping the copyright notice in anything that’s not the source code itself. Use it for anything you’d like, and if you make something useful, I’d be happy if you would contribute at patch back to me so that I could update the library itself.

You can see the application in action at my test installation. I’ll remove this test later, and be advised that the files actually will be transferred to my webserver. I’m just going to run rm -f * anyways, but if anyone breaks in and steals your precious uploaded files, you’re the one to blame.

== ARGUMENTS ==

The arguments to the flash file are provided in the flashVars attribute.

There are two required parameters:

destinationURL
The destination where all files are uploaded.

redirectWhenDoneURL
The URL the client is redirected to when all files have been uploaded.

Remember to urlencode both values.

Example:

<SEE THE INSTALL FILE IN THE ARCHIVE>

Optional parameters that are available is:

progressIndicatorColor: "#bfbfbf"
The color of the progress bar.

progressIndicatorBackgroundColor: "white"
The color of the empty bar before any progress has been made.

progressIndicatorWidth: 300
The width of the progress bar indicator.

uploadButtonText: "Click here to upload files!"
The text of the button the user has to click to start uploading files.


== COMPILING ==

To compile the SWF-file from the source code, download the Adobe Flex 3 SDK,
then run mxmlc against Swoooosh/Swoooosh.mxml:

mxmlc Swoooosh/Swoooosh.mxml


== CONTACT ==

Any and all comments are welcome. See the included LICENSE file for
information about usage. Short words: do whatever you want, just don't
claim you wrote it without contributing.

All patches are of course welcome.

UPDATE

As I’ve received many comments about the contents of test.php (the file that receives the post), here is the smallest version:

if (is_array($_FILES))
{
    foreach($_FILES as $file)
    {
        file_put_contents('directory_name_to_write_files_to/' . uniqid(), file_get_contents($file['tmp_name']));
    }
}

This will simply loop through all submitted files and write them to the temporary directory. As with other file uploads in PHP, you can access the original name of the file with the ‘name’ element in the $file array. DO NOT USE FILE NAME FROM ‘name’ WHEN WRITING THE FILE TO DISK. DOING THAT IS A VERY BAD IDEA, AS IT ALLOWS PEOPLE TO CREATE ANY FILE WITH ANY NAME (INCLUDING PHP-FILES WHICH CAN BE RUN IF THEY’RE AVAILABLE THROUGH THE WEB SERVER. YEP.). CAPS OFF.

Remember to make the directory you’re saving the files in WRITABLE for the process that writes the files (might be www-data or whatever user your webserver is running under). If you want to debug the response from the server regardless of what’s shown in the flash UI, use Wireshark to see the raw contents of packets and the conversions between the client and the server.

Even More Programming Challenges

Estimate (one of the two conspirators in “The Rule of Estimate and Dibon”) directed me to The Sphere Online Judge today. If you’ve grown tired of the challenges over at Project Euler or just want something different for a change, head on over. The challenges ranges from quite simple to very complex, and this time you’ll have to provide the program that solves the problem. This means that there is a time factor present and you can’t solve the problem using parallell processing.

For those who have attended NWERC or another version of ACMs programming contests, this is the same stuff (“NM i Programmering” for the norwegian people out there). Go go go!

The Power of Micro Languages

A “micro language” is a small, domain specific language. These small languages are often invented on a project basis or carried over from previous projects (or in some cases, a standardized language is used). The power of this comes as we’re able to move the rules of the application from the application logic, and instead can maintain a seperate set of rules independent of the application. This way you can add and remove business rules without having to re-test or rewrite your application. The concept has been around for years, and it’s in use more places than you could imagine, but I see people over and over again which does not see the benefits of separating the rules from the application itself.

These micro languages can be described in a markup language (such as XML), as a subset of other existing languages, or as your own simple-to-parse language. In this small introduction I’ll use an example from the current codebase of pwned.no, a site for arranging tournaments in several different online games.

When writing the engine that powers pwned.no I wanted to give the users several different options of tournaments to create. You have the regular single elimination tournaments, where winning a match moves you on to the next level, you have the double elimination events, where your team is not out before they’ve lost two matches, and you have the possibility of a group stage where the best teams (and possibly the worst) move on to the finals. Even after this, you realize that you might have situations where you want to combine these different sets of tournaments into a large one (example: elimination -> group stage -> double elimination). Embedding this logic into the application would create an unmaintainable mess of special cases and special handling, and each time I wanted to add a tournament, the possibility of breaking some other tournament were a possibility.

Instead I decided to implement a small language that describes tournaments, a parser to load and process the language and finally store it in an underlying database structure of how the tournament should progress when the next round starts. This meant that the job up front demanded a bit more planning and sketching, but the solution makes the system more flexible and more stable. When someone requests a new tournament format now, I simply create a small file describing the flow in the tournament, and everything works as it should. No code edits, no unit tests that need to be run, nothing at all, as long as the file is a valid tournament description file.

A tournament description file looks like this (for a tournament with a group stage, consisting of eight teams and with a single elimination stage for the two best teams from each group):

ID GROUP8
TEAMS 8
TOURNAMENTSTAGE 3 FINAL
MATCH
.ID FINALE
TOURNAMENTSTAGE 2 SEMIFINALS
MATCH
.ID SEMIFINALE1
.WIN FINALE
MATCH
.ID SEMIFINALE2
.WIN FINALE
TOURNAMENTSTAGE 1 PRELIMINARY_GROUPPLAY
GROUP
.ID GROUP1
.PLACETOMATCH 1 SEMIFINALE1
.PLACETOMATCH 2 SEMIFINALE2
GROUP
.ID GROUP2
.PLACETOMATCH 1 SEMIFINALE2
.PLACETOMATCH 2 SEMIFINALE1

The ID is an internal identifier which is resolved to a string in the currently active language, the TEAMS identifier tells the parser how many teams who can sign up for this tournament, and the rest of the lines are descriptions of the different stages of the tournament. The file can be parsed top-to-bottom, as the identifiers mentioned later in the file already exists higher up in the hierarchy. The .WIN commands tells the parser where the winning team should move next, and the .PLACETOMATCH commands indicates where the different places in the group stage should move next. If I wanted to add a lower bracket too, I’d simply add .PLACETOMATCH 3 LOWERBRACKET1 and a MATCH with .ID LOWERBRACKET1.

You could of course use XML for this instead, but this very simple and very easy to parse language has so far created a total of almost 25.000 tournaments, and everything has worked without a hitch. After resolving the first issues with the parser in my development version, there are now several new tournament formats that has been added (such as a tournament with 256 teams) in the years after the application was released.

This was just a very small introduction to the concept of micro languages, and you’ll find loads of more information about them online. You may ask what the difference between a micro language and a configuration file is, and the truth is that they’re quite similiar. But as the configuration file describes different settings for the application, the micro language describe different rules for the application. You may of course have configuration files that also contain rules, but you’ll need a well-defined and expressive language to define your rules. The domain where this language is used is so small and limited (there are only that many different concepts that are used in tournaments), so a simple language as this fits. If you’re in a situation where the micro language evolves into a full fledged programming language, you’d probably be better off with embedding an existing scripting language (such as Python), or moving the rules out into seperate modules in your application.