ssh_exchange_identification: Connection closed by remote host

Suddenly encountered the error message ssh_exchange_identification: Connection closed by remote host while ssh-ing into one of the machines facing the public side of the almighty internet today. A quick Google search turned up an article saying that the problem usually solves itself. The reason for this is simple: as this is a box that’s available on the general internet, from time to time a storm of SSH connection requests hits us as other compromised servers attempt to break in. When this happens, sshd may go into defensive mode and just refuse the connections instead of trying to handle them. This is good. This is also the reason why it “just suddenty works again”, since the attack may subside or some resources gets freed up.

There may of course be other reasons for this error, but if the machine is reachable through other measures, answers ping and worked an hour ago, this may be the cause. Guess it’s time to move the public ssh port to something else than 22.

Getting a Look Behind the Scenes of PDO

Ulf Wendel is presenting several good articles about the process about implementing the mysqlnd library for PDO as PDO_MYSQLND. I wrote about PDO_MYSQLND when it first was announced, but Ulf has posted two good articles about the implementation of PDO_MYSQLND since then. These articles provide a unique insight into how PDO is built and what challenges lies ahead for PDO2 (.. in addition to the license and NDA debate..)

Remember that you can always follow the latest developments over at Planet PHP.

Forgotten by Spambots

Five days. That’s all the time the mighty internet of spambots needed to forget that I exists. I feel ignored like that drunken guy that wants to be everybody’s best friend. People tend to look to the other way and make their “He’s just drunk you know” face. I’ve seen it in the eyes of The Internet.

The reason? Not a single spam caught in akismet for five days. I’m lost at sea here. Come back internet spambots. Come back! I didn’t mean what I said! Was it something I did? Please. I’m not that drunk.

Whoisi – Social Aggregation

Just found out about whoisi.com through John Resig, and it’s quite a nifty little app. It aggregates several feeds in the context of an individual. The application does not require any login, and builds on the collection of all resources people are able to gather for one particular individual. I’ve collected the available feeds for myself over at my whoisi.com page, so that you can actually follow my flickr page, my twitter and my blog from one location. If you have any other resources where I’m contributing (maybe my youtube-feed?), feel free to add them.

I also suggest playing with the “random person” feature, I’ve had quite a bit of fun with that one today.

Number one feature: I don’t have to log in at Whoisi. Amazing. I just get a personalized link that I can email to myself for storage or simply bookmark it in my browser (or private on a bookmark site). No hassle. No email. No person information. Instant win.

You can read more about the technical implementation over at Christopher Blizzard’s blog.

Using Apache httpd as Your Caching Solution

In this article I’m going to describe a novel solution for making cached versions of dynamic content available, while attempting to strike a balance between flexibility, performance and the origin of dynamic content. This solution may not be suited for very dynamic content (where the updates are better triggered by rewriting the cached version when the content changes), but in those situations where the dynamic content may be built from a very large dataset on request from the users. I have two use cases detailing applications I’ve been involved in building where I have applied this strategy. This could also be implemented with a caching service in front of the main service, but will require the installation of a custom service and hardware etc. for that service.

The WMS Cache

WMS (Web Map Service) is an OGC (Open Geospatial Consortium) specification which details a common set of parameters for how to query a web service which returns a raster map image (a regular png/jpg/bmp file) for an area. The parameters include the bounding box (left,bottom,right,upper) and the layers (roads,rivers,etc) and the size of the resulting image. The usual approach is to add a caching layer in the WMS itself, so any generated image is simply stored to disk, and then checked if the disk exists before retrieve the data and rendering the image (and if it exists, just return the image data from disk instead). This will increase the rate of requests the WMS can answer and will take load off the server for the most common requests. We are still left with the overhead of parsing the request, checking for the cached file and most notably, loading our dynamic language of choice and responding to the request. An example of such a small and naive PHP application is included:


The next request which arrives with the identical set of GET-parameters, will be served with the overhead of loading PHP, parsing the PHP-script (which is less if you have APC or a similar cache installed), sorting the GET-parameters (so that bbox=..&x=.. is the same as x=..&bbox=..), serializing the response, checking that the file exists on disk (you could simplify this to just doing a read and checking if the read succeeded), copying the data from disk to memory and then outputting the data to the client (you could also use fpassthru() and friends which may be more optimized for simple reading and output of data, but that's not the main point here).

To relate this to our use case of the WMS, we need to take a closer look at how map services are used today. Before Google showed the world what a good map solution could look like with modern web technology, a map application presented an image to the user, allowed the user to click or drag the image to zoom or move, and then reloaded the entire page to generate the new image. If it took 0.5s to generate the image, that were not really a problem, as the data set is not updated very often and it is very easy to do these operations in parallel across a cluster. When Google introduced Google Maps, they loaded 9 visible images (tiles) in the first image, and then started loading other tiles in the background (so that when you scroll the map, it looks like the images are already in place). If you run an interface similar to Google Maps against a regular WMS, most WMS servers would explode and take the whole 42U rack with them. Not a very desirable situation. The easy solution if you have an unlimited set of resources, disk space and money is to simply generate all the available tiles up front, in the same way as Google has done it. This will require disk space for all the tiles, and will not allow your users to choose which layers then want included in the map (this will change as map services are starting to build each layer as a separate tile and then superimposing them in the user interface).

The problem is that most of us (actually, n - 1) are not Google, but most of us do not build map services either. For those of us who do, we needed a way of living somewhere in between of having to render our complete dataset to image tiles up front or running everything through the WMS. While working with Gunnar Misund at Østfold University College, I designed a simple scheme to allow compatible clients to fetch cached tiles automagically, while those tiles which did not exist yet, were generated on the fly from the background WMS. The idea was to let Apache httpd handle the delivery of already generated and cached content, while our WMS could serve those areas which were viewed for the very first time (or where the layer selection were new). It would not be as fast as Google Maps for non-cached content, but it wouldn't require us to run through our complete service to generate images either.

The solution was to let the javascript client request images through a custom URL:

http://example.com/300/400/10/59.205278/10.95/rivers,roads/image.jpg

(This is just an example, and does only contain the center point of the image). This is decomposed into:

http://example.com/x_width/y_height/zoomlevel/centerlat/centerlon/layers/image.fileformat

This is all good as long as image.jpg exists in the local path provided, so that Apache can just serve the image as it is from the location. Apache httpd (or lighttpd and other "serve files fast!"-httpds) are able to serve these static files in large numbers (it's what they were written for, you know..) with a minimum overhead. The problem is what to do when the file actually does not exist, which will happen each time a resource is requested for the first time, and we do not have a cache yet. The solution lies in assigning a PHP-file as the handler for any 404 error (file not found). This is a well known trick used all over the field (such as handling www.php.net/functionname direct lookup). In PHP you can use $_SERVER['REQUEST_URI'] to get the complete path of the request that ended in the 404.

The .htaccess file of the application is as simple as cake:

ErrorDocument 404 /wms/handler.php

I've enclosed a simple specification which were written as a description of the implementation when the project was done in 2005.

Thumbnail generation

Generating thumbnails can also be transformed into the same problem set. In the case where you need several different sizes of thumbnails (and different rescales are needed for different applications), you can apply the same strategy. Instead of handing all the information to a resize script with the file name etc. as the argument, simply have the xsize and the ysize as part of the URL. If the file exists in the path, it's served directly with no overhead, otherwise the 404 handler is invoked as in the previous example. The thumbnail can then generated, saved in the proper location and the world can continue to rotate at it's regular pace.

This application can then be extended by adding new parameters in the url, such as the resize method, if the image should be stretched, zoomed and other options.

Conclusions

This is a very simple scheme that does not require any custom hardware or server software installed, and places itself neatly in between having a caching front end server between the client and the application and the hassle of generating the same file each and every time. It allows you to remove the overhead of invoking the script (PHP in this case) for each request, which means that you can serve files at a much greater rate and let your hardware do other, more interesting things instead.

Getting Scientific

In the latest edition of Birkebeiner’n, the norwegian magazine sent to all participants of Birkebeinerrittet, a novel way of applying the scientific method is described. The writer starts off with “[this technique] does not have a scientific proven effect on strength”, and then follow it up one sentence later with “since over 90% of athletes uses [the technique], we can conclude that it has an effect”.

So there you have it, as long as most people do it, it works.

PHP Vikinger Notes

Just a few notes from PHP Vikinger which were arranged by Derick Rethans in Norway today. Things went mostly smoothly and people in general seemed to have a very good time. These are just some of the random notes I made during the sessions.

All in all it was a good unconference, with a friendly and laid back tone and hopefully people got what they came for. Next time I’ll try to prepare a simple presentation on some interesting and hopefully not too familiar topic and actually contribute something too. We drove from Halden and Fredrikstad to Skien in the morning and back in the evening, which worked out quite OK, except for .. well, the lack of sleep in the morning. But everyone survived and managed to stay awake, so I conclude that the trip was a great success.

To sum it all up: a banana is a fruit and a tomato is a berry. You probably had to be there for that one.

Thanks for the unconference, and hopefully I’ll be able to attend more events in the future too.

UPDATE: Derick also has a writeup online from PHP Vikinger.

Derick and Sebastian Readying a Presentation

Two Books Down, One Up

I finished Sources of Power: How People Make Decisions a week or two ago, and after a bit of a reading hiatus for a week, I finally got started on Defensive Design for the Web from some of the guys at 37signals. Both books read very well and provided good insights into their subjects, and both has loads of examples that illustrates the points they’re trying to get across. For Defensive Design for the Web, this includes at least a hundred screenshots of different sites with comments and comparisons with successful sites in the same genre. Being a very practical book, I read the entire edition in a couple of hours, and while I’m not completely sure what I’ve taken away from it, I suggest reading it again from time to time to refresh your thoughts around the subject.

Anyways, after finishing these two books, I’ve now picked up Information Retrieval: Algorithms and Heuristics (2nd Edition) as my new reading material. This is much more algorithmic and theoretical than my previous books, so hopefully I’ll not get bored after a few chapters.

A Redirect Does Not Stop Execution

This is just a public service announcement for all the inexperienced developers who are writing redirects in PHP by issuing a call to header(“Location: <new url>”) to do their redirect. I see the same mistake time over and over again, and just to try to make sure that people actually remember this:

A Call to Header (even with Location:) Does NOT Stop The Execution of the Current Application!

A simple example that illustrates this:

 /* DO NOT COPY THIS EXAMPLE. */

if (empty($_SESSION['authed']))
{
    header('Location: http://example.com/');
}

if (!empty($_POST['text']))
{
    /* insert into database */
}

/* Do other admin stuff */

The problem here is that the developer does not stop script execution after issuing the redirect. While the result when testing this code will be as expected (a redirect happens when the user is not logged in and tries to access the link). There is however a gaping security hole here, hidden not in what’s in the file, but what’s missing. Since the developer does not terminate the execution of the script after doing the redirect, the script will continue to run and do whatever the user asks of it. If the user submits a POST request with data (by sending the request manually), the information will be inserted into the database, regardless of wether the user is logged in or not. The end result will still be a redirect, but the program will execute all regular execution paths based on the request.

There’s A Difference Between Being Inspired By and Outright Copying

A recently launched service that has gotten way too much attention in Norwegian press today is Qpsa.no – another “WHAT ARE YOU DOING NOW” service. Their business idea? Reimplementing Twitter, copying their look and defending it with “It’s in Norwegian!”.

First, let’s get this out of the way; I have absolutely no problem with people implementing similar services as other people, iterating concepts, being inspired by and in general, standing on the shoulders of giants. I do however have a slight problem with people directly copying other people success histories and passing them off as “revolutionizing social networks in Norway”. And although some news items has pointed out the link to Twitter, they have all failed to point out the fact that this is a blatant ripoff of the original service.

First, let’s start by looking at the main page:

Then we browse over to our international friends and discover their main page:

This seems way to similar to be a coincidence, so their “inspiration” seems quite obvious. In particular, notice the green box, the content of the page (nothing other than sign up). They’ve even managed to get quite a few birds in there too. The only thing they’re missing is the beautiful layout and look of Twitter, but hey, you can’t have it all. Or can you? On to the next comparison:

Versus our now known international friends yet again:

Hmm. This seems quite similar (thanks to Mister Noname for getting me a screenshot of his tweets updates). Guess it’s not really that much about actually trying to be original, but more about just copying what other people have created.

Their defense for creating the site: Twitter is not available in Norwegian, and Twitter is slow (Twitter doesn’t scale! [two bonus memepoints]). Yes. Twitter is slow from time to time, but this is where it gets even more interesting. Neither of the people behind the application are web developers, and obviously hasn’t given much thought about why Twitter is slow.

My guess is that hopefully Twitter will register a formal complaint or the people behind qpsa.no will get wiser and change their look. Maybe they’ll even try to actually build on the idea that created twitter, and create something that is worth checking out. The largest Norwegian community, Nettby has over 700.000 users (if we compare that to U.S. numbers, it would mean a US site with somewhere around 47 million active users), and could probably add this feature just as quick and with an established user base in those numbers, it would be a steam roller against a bird. A twittering little creature.

Bonus points for using “It’s in Norwegian!” as the main defense, then naming your service as a spanish phrase.