rinetd saves the day (.. or night, more properly)

October 4th, 2008

During thursday we moved our mail server and domain controller from our Oslo offices to the Fredrikstad office. As most of our staff now resides in Fredrikstad, this was a logical thing to do to keep the resources close to the people. The move went as planned, and all the servers and services started up as they should. There was one slight problem, as we’re accepting new mail through the link to our Oslo office (as that’s were the mailserver used to live), and our firewall routes the SMTP connections to our internal mailserver. We have an external mail relay that does mail filtering, before delivering the mail to our server. This means that we only accept SMTP connections from that particular host, as we’re not the general recipent (as we’re not the MX-record) for our own domain. This is handled in the Exchange configuration, where the allowed IP-s are set up.

Anyways, the connection between the two offices are handled internally by using a IPsec link, which routes encrypted traffic between the two networks. This has worked out just like it should, and we’ve been very happy with the setup. After moving the mail server, our plan was to simply route the connections that are coming in to our SMTP server over the VPN link to the other office, and then deliver the mail to the new address of the mail server. This proved to be more troublesome than expected. As the external filter service is run by an external company, we were unable to contact them to get them to change the address on short notice. After a couple of hours trying to get the packets to flow the right way, we decided to use an old friend of ours to help out instead; rinetd.

rinetd is a very, very simple daemon, which simply accepts a connection on a port and redirects that port on the socket layer to another host (or port). We did however need one other feature of rinetd; access control. As I mentioned earlier, we’re only interested in allowing certain hosts to talk to our SMTP server. Since the connections now are routed through rinetd instead, the only host Exchange ever will se delivering E-mail, is the rinetd server. This means that we had to implement the access control in rinetd instead, filtering on the external IPs connecting to our server.

rinetd allows these settings to be provided globally, so that you are able to allow or deny certain hosts for all services using rinetd. This would have worked out nice, if it hadn’t been for the fact that we also run our web mail through exchange on the same host. So we also need to forward the SSL connection for the webmail, and we have to allow all hosts to connect to the webmail. Luckily, rinetd also makes it possible to solve this, as you’re able to provide access rules for each of the forwarded ports separately.

There is however one caveat; the global rules will always be evaluated first. If you have certain global rules that deny or allow traffic (if you have one allow rule, all other traffic will be denied, unless explicitly allowed), the connection might never reach the evaluation of the “local” rules for each forward. This means that if one of your connections should allow people to connect from anywhere (such as our ssl webmail), your global settings have to allow all clients to connect.

We therefor moved our limitations for the SMTP server to that section instead, and ended up with this access control configuration:

# bindadress    bindport  connectaddress  connectport
# SMTP
<ip1> 25 <ip2> 25
allow 192.168.0.*
allow 192.168.1.*
allow <external ip>
allow <external ip>
allow <external ip>

# SSL
<ip1> 443 <ip2> 443
allow *

And after that, everything worked perfectly until we got our mail filter provider to switch to the new address of the SMTP server. rinetd saves us that night, and not a single one of our staff noticed the difference. If they don’t know we’re there, we’re doing something right.

PHP, ImageMagick and Cropping to GIF: Digging into GIFs again!

July 15th, 2008

Christer had an interesting case today, where he tried to resize and crop an image with the Imagick extension for PHP. Everything went as planned, the image was cropped and resized at it should be, but after writing it to disk and opening it again, the image’s size was the same as if he hadn’t done the crop. The content of the image outside the crop area was removed (simply set as transparent), but the image was still returned in it’s uncropped size.

The PHP module for binding ImageMagick is quite simple (simply marshalling between the ImageMagick methods and the PHP user space), so my guess is that this is a weird behaviour with a good enough reason somewhere down in ImageMagick. It might be a bug, but I haven’t had the time to attempt to reproduce it with convert or mogrify yet. If anyone wants to attempt that, feel free. Christer has posted the code, so simply attempt to recreate the same symptoms by using one of these two tools.

Anyways, this post was not to be about the issue itself, as Christer has done a neat analysis and write-up of that, but I’ll give a more detailed look at the issue within the GIF file itself. As chance would have it, I recently participated in a competition at the norwegian demoscene IRC hangout where the goal was to recreate the norwegian flag in an HTML page in the smallest space possible. This ended up being a competition to see who could molest and optimize GIF images the most, while browsers still were able to display them. From this experience I had a quite good knowledge of how GIF files are built internally, and I were able to do a good guess of what could be the actual issue in the resulting file.

Since GIF files can be animated, a single file may contain several “images” (which would be the frames in the animation). These images can have their own size and position within the “larger image”:

 _________
| im1     |
|    _____|
|   |     |
|   | im2 |
|___|_____|

im1 may then represent the first image and im2 the second image in the file. The second image will only update the area that it covers, and this will leave the rest of the image “as it is”. Since a GIF image may contain a large number of these images, a “global” size is defined for the image. This global size covers all the images, and is the total area that these images will be drawn into. If an image is drawn outside of this area (in part or whole), it will be clipped against the viewport.

This should provide enough background to at least give a general feeling about what COULD be the problem here, but to actually find out what’s happening, we’ll dig into a GIF file format specification and the file that was created. This simple reference provides a general layout of the GIF file, and we’ll use that to take a look at what values the file we ended up with had:

On the left we have the actual byte values in hex and on the right we have the corresponding ASCII character represented by that value. As you can see, the first six bytes of the file (0×47 0×49 0×46 0×38 0×39 0×61 (0x is the general way of prefixing numbers that should be interpreted as hexadecimal)) corresponds to “GIF89a” (You can do this exercise yourself armed with this Ascii Table. Simply look up 47 in the Hx column, then 49, etc). Those six bytes are what we call the signature of a GIF file (although the number can be different, i.e. GIF87a, depending on the version used).

The next fields in the specification reads:

Offset   Length   Contents
  6      2 bytes  
  8      2 bytes  

So byte 6-7 and byte 8-9 should tell us the logical size of the whole gif file (which the images will be drawn onto). In our test file here, that’s represented as:

Width: 0x67 0x01
Height: 0x70 0x00

The byte order here is Little Endian, which means that the least important values are placed first. Since we have two bytes for each value, we can calculate the decimal value of the width by multiplying:

0x67 0x01 = 6 * 16 + 7 + (0 * 16 + 1) * 256 = 359
                                        ^-- Since we're in the next byte, we multiply with 256.

You can also do this with the windows calculator, by entering 167 while being in hexmode, then selecting dec (for decimal). The reason for multiplying the second byte with 256 is that this byte provides the value of the “next 8 bits”, while the first provided the value for the first 8 bits. If we see the bits themselves:

0x70 | 0x01: 0111 0000 | 0000 0001

Little Endian says that the least significant bits come first, so to get the raw bit values, we turn it around:

0000 0001 0111 0000

As you can see, the value of the second byte (0×01) can be multiplied with 256 (which is the last 8 bits).

We can also calculate the height:

0x70 0x00 = 7 * 16 + 0 + (0 * 16 + 0) * 256 = 112
                          ^-- both numbers in the second byte is zero

Alas, the global header of the GIF image that were generated says that the size of the image is 359×112, which is why the image is rendered larger than it should have been. We then take a look at the Image section of the GIF file (all GIF files should contain at least one), which is defined as:

Offset   Length   Contents
  0      1 byte   Image Separator (0x2c)
  1      2 bytes  Image Left Position
  3      2 bytes  Image Top Position
  5      2 bytes  Image Width
  7      2 bytes  Image Height

Armed with this information, we examine the area where the image section starts:

The start of the Image section is the “Image Separator”, a byte value of 0×2c, shown highlighted in the image above. This is where the image section starts, and the offsets in the table is relative to this location. The next four bytes tells us where in the global viewport the upper left corner of this image should be drawn. The values here are 0×01 0×00 twice, simply meaning (1,1), or one pixel down and out from the upper left corner (which is also related to the issue posted by Christer, but we ignore that one here now). The next values are however those we are interested in, which provides Image Width and Image Height:

Width:
0x73 0x00 = 7 * 16 + 3 + (0 * 16 + 0) * 256 = 115
Height:
0x6F 0x00 = 6 * 16 + 15 + (0 * 16 + 0) * 256 = 111

This means that the dimension of the image that’s actually supplied in the GIF file, is 115×111 pixels and should be drawn beginning one pixel down and one pixel out (as given by 0×01 0×00 in the x,y-fields above). Compare this to the reported global size of the image (359×112), and we can see where our transparent space is coming from. The browsers (and other image viewers) create a canvas the size of 359×112 pixels, while only drawing an image into the 115 leftmost pixels. The rest is left transparent, but they’re still there as the file says that’s the size of the viewport. If we manually change the size of the viewport to 0×74 0×00 in the GIF header itself, the image displays properly. To illustrate with another great ascii drawing:


               viewport
 _____________________________________
|           |                         |
|  actual   |                         |
|  image    |                         |
|  drawn    |                         |
|           |                         |
|           |                         |
|           |                         |
|___________|_________________________|

The solution to the problem here were to call the setImagePage method of the image object, as that allows us to set the values for the global image ourselves (and we know how wide the image were supposed to be).

Bonus knowledge: This issue did not occur when saving to a JPEG file, as JPEG files does not have the same capability of storing several subimages inside one file, and does not have the same rendering subsystem as GIF files. ImageMagick knows this, and does not use the page-values when rendering the file.

Hopefully this has provided a minor introduction into how files are structured, what you can learn armed with a hex editor and a file format specification and provided a few insights into what you can do when you’re faced with a very weird problem.

Whoisi - Social Aggregation

June 27th, 2008

Just found out about whoisi.com through John Resig, and it’s quite a nifty little app. It aggregates several feeds in the context of an individual. The application does not require any login, and builds on the collection of all resources people are able to gather for one particular individual. I’ve collected the available feeds for myself over at my whoisi.com page, so that you can actually follow my flickr page, my twitter and my blog from one location. If you have any other resources where I’m contributing (maybe my youtube-feed?), feel free to add them.

I also suggest playing with the “random person” feature, I’ve had quite a bit of fun with that one today.

Number one feature: I don’t have to log in at Whoisi. Amazing. I just get a personalized link that I can email to myself for storage or simply bookmark it in my browser (or private on a bookmark site). No hassle. No email. No person information. Instant win.

You can read more about the technical implementation over at Christopher Blizzard’s blog.

Using Apache httpd as Your Caching Solution

June 26th, 2008

In this article I’m going to describe a novel solution for making cached versions of dynamic content available, while attempting to strike a balance between flexibility, performance and the origin of dynamic content. This solution may not be suited for very dynamic content (where the updates are better triggered by rewriting the cached version when the content changes), but in those situations where the dynamic content may be built from a very large dataset on request from the users. I have two use cases detailing applications I’ve been involved in building where I have applied this strategy. This could also be implemented with a caching service in front of the main service, but will require the installation of a custom service and hardware etc. for that service.

The WMS Cache

WMS (Web Map Service) is an OGC (Open Geospatial Consortium) specification which details a common set of parameters for how to query a web service which returns a raster map image (a regular png/jpg/bmp file) for an area. The parameters include the bounding box (left,bottom,right,upper) and the layers (roads,rivers,etc) and the size of the resulting image. The usual approach is to add a caching layer in the WMS itself, so any generated image is simply stored to disk, and then checked if the disk exists before retrieve the data and rendering the image (and if it exists, just return the image data from disk instead). This will increase the rate of requests the WMS can answer and will take load off the server for the most common requests. We are still left with the overhead of parsing the request, checking for the cached file and most notably, loading our dynamic language of choice and responding to the request. An example of such a small and naive PHP application is included:

  1. <?php
  2. $parameters = $_GET;
  3. ksort($parameters);
  4. $cacheIdentifier = md5(serialize($parameters));
  5.  
  6. if (file_exists('cache/' . $cacheIdentifier))
  7. {
  8.     print(file_get_contents('cache/' . $cacheIdentifier));
  9.     exit();
  10. }
  11.  
  12. $data = '';
  13. /* Do stuff to generate data */
  14. file_put_contents('cache/' . $cacheIdentifier, $data);
  15. print($data);
  16. exit();

The next request which arrives with the identical set of GET-parameters, will be served with the overhead of loading PHP, parsing the PHP-script (which is less if you have APC or a similar cache installed), sorting the GET-parameters (so that bbox=..&x=.. is the same as x=..&bbox=..), serializing the response, checking that the file exists on disk (you could simplify this to just doing a read and checking if the read succeeded), copying the data from disk to memory and then outputting the data to the client (you could also use fpassthru() and friends which may be more optimized for simple reading and output of data, but that’s not the main point here).

To relate this to our use case of the WMS, we need to take a closer look at how map services are used today. Before Google showed the world what a good map solution could look like with modern web technology, a map application presented an image to the user, allowed the user to click or drag the image to zoom or move, and then reloaded the entire page to generate the new image. If it took 0.5s to generate the image, that were not really a problem, as the data set is not updated very often and it is very easy to do these operations in parallel across a cluster. When Google introduced Google Maps, they loaded 9 visible images (tiles) in the first image, and then started loading other tiles in the background (so that when you scroll the map, it looks like the images are already in place). If you run an interface similar to Google Maps against a regular WMS, most WMS servers would explode and take the whole 42U rack with them. Not a very desirable situation. The easy solution if you have an unlimited set of resources, disk space and money is to simply generate all the available tiles up front, in the same way as Google has done it. This will require disk space for all the tiles, and will not allow your users to choose which layers then want included in the map (this will change as map services are starting to build each layer as a separate tile and then superimposing them in the user interface).

The problem is that most of us (actually, n - 1) are not Google, but most of us do not build map services either. For those of us who do, we needed a way of living somewhere in between of having to render our complete dataset to image tiles up front or running everything through the WMS. While working with Gunnar Misund at Østfold University College, I designed a simple scheme to allow compatible clients to fetch cached tiles automagically, while those tiles which did not exist yet, were generated on the fly from the background WMS. The idea was to let Apache httpd handle the delivery of already generated and cached content, while our WMS could serve those areas which were viewed for the very first time (or where the layer selection were new). It would not be as fast as Google Maps for non-cached content, but it wouldn’t require us to run through our complete service to generate images either.

The solution was to let the javascript client request images through a custom URL:

http://example.com/300/400/10/59.205278/10.95/rivers,roads/image.jpg

(This is just an example, and does only contain the center point of the image). This is decomposed into:

http://example.com/x_width/y_height/zoomlevel/centerlat/centerlon/layers/image.fileformat

This is all good as long as image.jpg exists in the local path provided, so that Apache can just serve the image as it is from the location. Apache httpd (or lighttpd and other “serve files fast!”-httpds) are able to serve these static files in large numbers (it’s what they were written for, you know..) with a minimum overhead. The problem is what to do when the file actually does not exist, which will happen each time a resource is requested for the first time, and we do not have a cache yet. The solution lies in assigning a PHP-file as the handler for any 404 error (file not found). This is a well known trick used all over the field (such as handling www.php.net/functionname direct lookup). In PHP you can use $_SERVER['REQUEST_URI'] to get the complete path of the request that ended in the 404.

The .htaccess file of the application is as simple as cake:

ErrorDocument 404 /wms/handler.php

I’ve enclosed a simple specification which were written as a description of the implementation when the project was done in 2005.

Thumbnail generation

Generating thumbnails can also be transformed into the same problem set. In the case where you need several different sizes of thumbnails (and different rescales are needed for different applications), you can apply the same strategy. Instead of handing all the information to a resize script with the file name etc. as the argument, simply have the xsize and the ysize as part of the URL. If the file exists in the path, it’s served directly with no overhead, otherwise the 404 handler is invoked as in the previous example. The thumbnail can then generated, saved in the proper location and the world can continue to rotate at it’s regular pace.

This application can then be extended by adding new parameters in the url, such as the resize method, if the image should be stretched, zoomed and other options.

Conclusions

This is a very simple scheme that does not require any custom hardware or server software installed, and places itself neatly in between having a caching front end server between the client and the application and the hassle of generating the same file each and every time. It allows you to remove the overhead of invoking the script (PHP in this case) for each request, which means that you can serve files at a much greater rate and let your hardware do other, more interesting things instead.

Swoooosh - Free Open Source Flash-based Multi File Uploader

May 25th, 2008

As I’ve mentioned a few times before I’ve been playing around with Adobe Flex. I finally got some more time to play with it tonight, so I got everything together to a semi-usable shape. A few things are still missing, such as moving the active uploads to the top and handling more than a total number of x queued uploads (at a certain level the progress bars will just disappear out of the Flash area, then magically appear as enough other items are finished).

Download Swoooosh.tar.bz2!

I’m looking for any response on this, and if anyone want to play around with it, please go ahead. It should be fairly simple to set up. I’ve included a brief description of the arguments it accepts below. Everything is released under a slightly modified MIT-based license, where the only change is that I’ve removed the need for keeping the copyright notice in anything that’s not the source code itself. Use it for anything you’d like, and if you make something useful, I’d be happy if you would contribute at patch back to me so that I could update the library itself.

You can see the application in action at my test installation. I’ll remove this test later, and be advised that the files actually will be transferred to my webserver. I’m just going to run rm -f * anyways, but if anyone breaks in and steals your precious uploaded files, you’re the one to blame.

== ARGUMENTS ==

The arguments to the flash file are provided in the flashVars attribute.

There are two required parameters:

destinationURL
The destination where all files are uploaded.

redirectWhenDoneURL
The URL the client is redirected to when all files have been uploaded.

Remember to urlencode both values.

Example:

<SEE THE INSTALL FILE IN THE ARCHIVE>

Optional parameters that are available is:

progressIndicatorColor: "#bfbfbf"
The color of the progress bar.

progressIndicatorBackgroundColor: "white"
The color of the empty bar before any progress has been made.

progressIndicatorWidth: 300
The width of the progress bar indicator.

uploadButtonText: "Click here to upload files!"
The text of the button the user has to click to start uploading files.

== COMPILING ==

To compile the SWF-file from the source code, download the Adobe Flex 3 SDK,
then run mxmlc against Swoooosh/Swoooosh.mxml:

mxmlc Swoooosh/Swoooosh.mxml

== CONTACT ==

Any and all comments are welcome. See the included LICENSE file for
information about usage. Short words: do whatever you want, just don't
claim you wrote it without contributing.

All patches are of course welcome.

UPDATE

As I’ve received many comments about the contents of test.php (the file that receives the post), here is the smallest version:

  1. if (is_array($_FILES))
  2. {
  3.     foreach($_FILES as $file)
  4.     {
  5.         file_put_contents('directory_name_to_write_files_to/' . uniqid(), file_get_contents($file['tmp_name']));
  6.     }
  7. }

This will simply loop through all submitted files and write them to the temporary directory. As with other file uploads in PHP, you can access the original name of the file with the ‘name’ element in the $file array. DO NOT USE FILE NAME FROM ‘name’ WHEN WRITING THE FILE TO DISK. DOING THAT IS A VERY BAD IDEA, AS IT ALLOWS PEOPLE TO CREATE ANY FILE WITH ANY NAME (INCLUDING PHP-FILES WHICH CAN BE RUN IF THEY’RE AVAILABLE THROUGH THE WEB SERVER. YEP.). CAPS OFF.

Remember to make the directory you’re saving the files in WRITABLE for the process that writes the files (might be www-data or whatever user your webserver is running under). If you want to debug the response from the server regardless of what’s shown in the flash UI, use Wireshark to see the raw contents of packets and the conversions between the client and the server.

77 Things To Do With Your iPod (.. other than just listen to music on it)

April 29th, 2008

Travelhacker has an ingenious list with 77 alternative things to do with that old iPod you’ve just been ignoring for a while. If you’ve ever wondered how to convert your iPod into a sex toy, create an iPod Taser (iTaser?) or create your own iPod based pirate radio station, this guide is for you.

Showing Source By URL in Firefox

April 25th, 2008

A simple question popped up on IRC today: How to get Firefox to show the source of a webpage given by the URL instead of having to show the page and then selecting "View source". The solution is to use the view-source:-qualifier: view-source:http://en.wikipedia.org/wiki/ to see the HTML source for Wikipedia’s front page. The solution was presented over at Stuart Langridge’s blog where he also presents a simple bookmarklet for swapping source view on and off in the current window.