Programming – Page 8

Short Array Syntax for PHP

En route from the aggregated stream of Planet PHP comes a small post from Brian Moon about Stan’s suggestion for introducing the [] syntax for creating lists in PHP.

I like Python. This is like Python. By association, I like this. This would mean that you now could do:

$a = [1, 2, 3];

instead of $a = array(1,2,3); and:

$a = [‘one’ => 1, ‘two’ => 2]

instead of $a = array(‘one’ => 1, ‘two’ => 2);

Syntactial sugar, but still, sweet sugar.

If we just could get [:]-syntax for slicing and dicing too now..

The Progress Events 1.0 W3C Draft

While working on Swoooosh I stumbled across this draft from the W3C regarding “Progress Events 1.0”. The specification defines a set of standardized progress events and is intended for use with XMLHttpRequest and MAE. From their Abstract:

This document describes event types that can be used for monitoring the progress of an operation. It is primarily intended for contexts such as data transfer operations specified by XMLHTTPRequest [XHR], or Media Access Events [MAE].

Swoooosh – Free Open Source Flash-based Multi File Uploader

As I’ve mentioned a few times before I’ve been playing around with Adobe Flex. I finally got some more time to play with it tonight, so I got everything together to a semi-usable shape. A few things are still missing, such as moving the active uploads to the top and handling more than a total number of x queued uploads (at a certain level the progress bars will just disappear out of the Flash area, then magically appear as enough other items are finished).

Download Swoooosh.tar.bz2!

I’m looking for any response on this, and if anyone want to play around with it, please go ahead. It should be fairly simple to set up. I’ve included a brief description of the arguments it accepts below. Everything is released under a slightly modified MIT-based license, where the only change is that I’ve removed the need for keeping the copyright notice in anything that’s not the source code itself. Use it for anything you’d like, and if you make something useful, I’d be happy if you would contribute at patch back to me so that I could update the library itself.

You can see the application in action at my test installation. I’ll remove this test later, and be advised that the files actually will be transferred to my webserver. I’m just going to run rm -f * anyways, but if anyone breaks in and steals your precious uploaded files, you’re the one to blame.

== ARGUMENTS ==

The arguments to the flash file are provided in the flashVars attribute.

There are two required parameters:

destinationURL
The destination where all files are uploaded.

redirectWhenDoneURL
The URL the client is redirected to when all files have been uploaded.

Remember to urlencode both values.

Example:

<SEE THE INSTALL FILE IN THE ARCHIVE>

Optional parameters that are available is:

progressIndicatorColor: "#bfbfbf"
The color of the progress bar.

progressIndicatorBackgroundColor: "white"
The color of the empty bar before any progress has been made.

progressIndicatorWidth: 300
The width of the progress bar indicator.

uploadButtonText: "Click here to upload files!"
The text of the button the user has to click to start uploading files.


== COMPILING ==

To compile the SWF-file from the source code, download the Adobe Flex 3 SDK,
then run mxmlc against Swoooosh/Swoooosh.mxml:

mxmlc Swoooosh/Swoooosh.mxml


== CONTACT ==

Any and all comments are welcome. See the included LICENSE file for
information about usage. Short words: do whatever you want, just don't
claim you wrote it without contributing.

All patches are of course welcome.

UPDATE

As I’ve received many comments about the contents of test.php (the file that receives the post), here is the smallest version:

if (is_array($_FILES))
{
    foreach($_FILES as $file)
    {
        file_put_contents('directory_name_to_write_files_to/' . uniqid(), file_get_contents($file['tmp_name']));
    }
}

This will simply loop through all submitted files and write them to the temporary directory. As with other file uploads in PHP, you can access the original name of the file with the ‘name’ element in the $file array. DO NOT USE FILE NAME FROM ‘name’ WHEN WRITING THE FILE TO DISK. DOING THAT IS A VERY BAD IDEA, AS IT ALLOWS PEOPLE TO CREATE ANY FILE WITH ANY NAME (INCLUDING PHP-FILES WHICH CAN BE RUN IF THEY’RE AVAILABLE THROUGH THE WEB SERVER. YEP.). CAPS OFF.

Remember to make the directory you’re saving the files in WRITABLE for the process that writes the files (might be www-data or whatever user your webserver is running under). If you want to debug the response from the server regardless of what’s shown in the flash UI, use Wireshark to see the raw contents of packets and the conversions between the client and the server.

The First Presentations From php|tek Online

php|tek is taking place halfway around half of the world for me, but the first presentations from the conference is beginning to appear online now. The first three presentations are from Brian DeShong and Maggie Nelson:

Brian DeShong: The Grown-Up Company’s Guide to Development
Brian DeShong: Robust Batch Processing with PHP
Maggie Nelson: Keeping Your DB and PHP in Sync

While Brians two presentations were mostly familiar stuff for me, Maggie’s presentation touched something that has troubled me time over and again, and that Christer and me has been looking for a good solution to. We’re currently experiencing the problems described, and I’ve been searching several times for a good tool to generate sqldiffs (and not for the _values_ of most of the tables). I’m waiting eagerly for the first release, and as soon as things are up and running, I’ll look into if there’s anything I can contribute.

Memcache Stats

Harun Yayli has a post up about a PHP application for retrieving and presenting memcache stats from several servers. The interface is built on the familiar apc.php which is included with APC (Alternative PHP Cache, a PHP bytecode / shared memory storage cache), and should be easy to navigate for anyone who have used apc.php. The stats shown in the application includes total space available, memcache usage, hits, misses, uptime and other interesting information.

Even More Programming Challenges

Estimate (one of the two conspirators in “The Rule of Estimate and Dibon”) directed me to The Sphere Online Judge today. If you’ve grown tired of the challenges over at Project Euler or just want something different for a change, head on over. The challenges ranges from quite simple to very complex, and this time you’ll have to provide the program that solves the problem. This means that there is a time factor present and you can’t solve the problem using parallell processing.

For those who have attended NWERC or another version of ACMs programming contests, this is the same stuff (“NM i Programmering” for the norwegian people out there). Go go go!

Using MySQL Proxy to Update Memcached

Jan Kneschke has a very interesting post about how you could use replication and MySQL Proxy to mark entries as dirty in memcached. This way you’re able to expire the data from memcached when it actually is updated on the database level, without having to add another level of abstraction in your application. A very novel approach and it’ll be nice to see how this plays out in practice with 5.1.

The Graph of Company Classification

I’ve been meaning to do this for quite some time, but I never found the time before yesterday’s evening. Equipped with the data we’ve made searchable at Derdubor, I digged into the classification of the companies that our dataprovider provides us with. Their classification uses the standard NACE codes for communicating what type of business we’re dealing with, and this set of different classifications is standardized across european nations (there is a new standard that was released in 2007, to further synchronize the classification across the nations).

My goal was to explore the graph that describes the relationship between the different groups of classification. A company may be classified in more than one group, and by using this as a edge in the graph between the classifications, I set out and wrote a small Python program for parsing the input file and building the graph in memory. For rendering the graph I planned on using the excellent GraphViz application, originally created at AT&T just for the purpose of creating beautifully rendered graphs of network descriptions.

My Python-program therefor outputs a file in the dot language, which I then run through neato [PDF] to render the beautiful graph as a PDF.

An example from my generated dot-file:

graph bransjer {
	graph [overlap=scale];
	node [color=lightblue2, width=0.1, fontsize=12, height=0.1, style=filled];
	"Forsikr.,pensjonsfond-unntatt off. trygd" -- "Forsikringsagenter og assurandører" [penwidth=1.15441176471];
	"Forsikr.,pensjonsfond-unntatt off. trygd" -- "Hjelpevirksomhet for forsikring og pensj" [penwidth=1.23382352941];
	"Forsikr.,pensjonsfond-unntatt off. trygd" -- "Skadeforsikring" [penwidth=1.35294117647];

The penwidth=-attributes sets the width of the line between the nodes, and the “string” — “string”-entries describes an edge between the nodes.

I first ran into problems with managing this enormous graph (we’re talking 500k relations here), as trying to render the complete graph would throw both dot and neato off (as soon as we pass 2000 relations, things begin to go awry). Actually, this turned out to be a good thing, as it made me (and with Jørn chipping in a bit) think a bit more about what I actually wanted to graph. I’m not really interested in places where there only are one or two links between different classification groups, as these may be wrongly entered, very peculiar businesses etc. (with a total of 500k registrations, such things are quite common). Instead, I focused on the top ~1000 edges. By limiting my data set to the top 1000 most common relationship between groups, I’m able to render the graph in just below ten seconds, including time to parse and build the graph in Python before filtering it down.

The resulting graph of NACE connections is quite interesting, and shows that most classifications are connected in some way. If I further extend the number of edges, the sub graphs that are left unconnected to the “main graph” would probably establish connections. An interesting observation is that most health service-related businesses (such as doctors, hospitals, etc) live in their own sub graph disconnected from the main graph (this is the graph in the upper right). Another interesting part is the single link from the “main graph” and up into the IT consultancy business group (webdesign, application development, etc) which is placed in the top of the graph.

Yahoo!, SearchMonkey and Microformats

Both Rasmus and Sara has posts up about a new feature of Yahoo! Search which actually seems to be a step forward in terms of search engine functionality. This will make it possible for 3rd party developers to actually run code on Yahoo!’s servers to enhance their search result for your own page.

The first examples shows how they’ve used Microformats to give a better presentation of businesses available. I’ve previously implemented the hCard microformat at Derdubor, where we have a local directory search for businesses in Norway. All our search hits and profile pages are tagged up in microformats, so that a compliant parser are able to fetch business information and provide it to our users in a proper way. It’s simply great to see Yahoo! add this kind of support for new formats, and I’m already looking forward to playing with it to give better results for pwned.no and a few other projects I’m playing around with.