Small Collection of Useful Reading Material

I’ve finally gotten through the 600+ unread items in my RSS reader after getting back from my vacation! To celebrate I’ve decided to share a few useful items with my own comments included:

  • Matasano Chargen has a writeup of Dowd’s Inhuman Flash Exploit. This exploit for flash was detailed back in April, but I never got around to reading the very good and concise writeup before now. Amazing exploit vector.
  • Jeff Atwood is getting frustrated by tagsoups. Even though we’re only working with our presentation layer, templates (regardless of which language/platform they’re written in) tend to get very soupish fast. I’ve given the issue some thought, but I’ve been unable to think of any revolutionizing way to do this just yet. I currently use a good syntax highlighter which is able to show the different sections in a good way, and do of course use proper indenting. It’s still noisy and unreadable from time to time, so yes. Someone needs to get something better on the road. Now.
  • Posterous is a simple web application where everyone can create their own little site / (micro)blog, simply by emailing stuff to a predefined e-mail address. I’ve lately became fascinated with services that do not require you to register and create an account, but are still able to provide personal experiences. This is one of those services.
  • XMPP has been on the horizon for a long time, but I’ve never gotten around to really digging into it. If anyone have any experiences with using XMPP as an RPC platform, I’d be happy to hear about any experiences. This also includes looking at OAuth and how we might use it for future services. Have a look at Beyond REST for a useful introduction.

Then we also have drizzle, but I’ll detail that in another post.

Resolutions Be Gone!

(Haha! Thanks to the wonderful internet, I’m on vacation .. and still posting things in my blog .. that I wrote BEFORE we left! How’s that for TIME TRAVEL???)

We should really stop using the resolution information in our statistics scripts for anything. It’s nothing other than curiousity, and it should never be used for making any decisions regarding how a page should be structured. Most people probably stopped reading halfway through and thought that I’m going completely nutters. What do you mean? But .. how can we know how many pixels we can use for our design?

You’re out of luck! Hopefully I’m not singing Lady Madonna and strolling around somewhere in Europe wearing a pink tubetop just yet, and I still have my sanity. Instead of basing our decisions on resolution, we should start paying more attention to actual browser window size. This is interesting from a usability perspective, as it considers the size that we actually can use without demanding that our users start resizing their active browsing window. The desktops I use actively are usually running in 1920×1200, but my browser window is considerably smaller. If a page suddenly would require me to use 1920×1200, I’d be annoyed (and yes, that includes those of you who still think your content is SOOOOO important you should open up windows in fullscreen mode, and still only use 50% of it for content..), even though I actually have a resolution that supports that kind of content.

Since this means that we’d see a lot more sizes than the regular 1024×768, 1280×1024, etc, we’d simply create intervals of “acceptable sizes” instead. We’d have (640-800), (801-980), (981-1099) instead, and we’d be able to give an estimate of how much content we could actually show a certain percentage of our users within their first view of our information. Even though people have 1024×768 as their resolution, we have no idea about how much content they actually see. I have no idea if that means that the majority of people see 500 pixels of content, 300 pixels of content, etc (and that would of course also change based on font size, but .. stay on track with me here), which is actually the useful information to base your decisions on.

So, anyone with me? How about support for this metric in Google Analytics? Anyone know of any existing statistics project that has added support for the actual viewport instead of the screen resolution?

Gone Vacationing

Woo. I’m actually leaving for a bit of vacation tomorrow, so I’ll not promise any steady updates for anything until returning on saturday the 26th. I’ll check up on my email on suitable occasions, but as the amount of spam that’s being let through my spamassassin is getting to new heights, your best bet for anything might be leaving a comment here.

I might upload stuff to flickr if I have the time, or I might not. I might update the blog, or I might not.

It’s a time of uncertainity!

Gadgetting Up The Bike

A good day for internet shopping! After spending some time riding my bike the last months, I’ve finally decided to purchase a proper cycle computer. After riding the regular track together with Anders today, he sent me the files from the run from his Garmin Edge 305. I’ve delayed this decision for a very long time, but when I saw that the files were pure XML and that the bundled training software actually weren’t that bad, I were sold.

So off we went to the internet to search for a Garmin Edge for a fair price here in Norway. Kartbutikken turned out to have the best prices, and they have the Garmin Edge 205 for a special price right now! I do however want support for heart rate monitoring and more importantly, the rate of pedaling (cadence). This meant upgrading to at least the Garmin Edge 305, but as it turned up to be hard to get it for a fair price, I rather went all the way and got the Garmin Edge 705! It’s a bad mother.. lover, and supports everything. Being a bit of a map nerd, I’m getting excited just thinking about mapping these beautiful paths on a map. Yeeehaaaw!

I’ll be sure to bring it along when I’m doing Grenserittet in just about two weeks time (~80 km).

Bonus points for using “gadgetting” for the first time.

The Awesomeness of Dr. Horrible

I’ll just spend a few minutes about something other than decoding bit patterns and writing SOLR plugins for a while:

I’ve been watching the amazing Dr. Horrible’s Sing-along Blog today, and I’m now eagerly waiting for the last episode to show up on the internet tomorrow. Produced by Joss Whedon and with an accompanying Dr. Horrible Comic Book, it’ll be a little bit sad when the last episode is over tomorrow. Neil Patrick Harris showed a darker potential in Harold & Kumar Escape from Guantanamo Bay, and this really shows that he’s one to consider for the future.

Goodness! Look at my wrist!

PHP, ImageMagick and Cropping to GIF: Digging into GIFs again!

Christer had an interesting case today, where he tried to resize and crop an image with the Imagick extension for PHP. Everything went as planned, the image was cropped and resized at it should be, but after writing it to disk and opening it again, the image’s size was the same as if he hadn’t done the crop. The content of the image outside the crop area was removed (simply set as transparent), but the image was still returned in it’s uncropped size.

The PHP module for binding ImageMagick is quite simple (simply marshalling between the ImageMagick methods and the PHP user space), so my guess is that this is a weird behaviour with a good enough reason somewhere down in ImageMagick. It might be a bug, but I haven’t had the time to attempt to reproduce it with convert or mogrify yet. If anyone wants to attempt that, feel free. Christer has posted the code, so simply attempt to recreate the same symptoms by using one of these two tools.

Anyways, this post was not to be about the issue itself, as Christer has done a neat analysis and write-up of that, but I’ll give a more detailed look at the issue within the GIF file itself. As chance would have it, I recently participated in a competition at the norwegian demoscene IRC hangout where the goal was to recreate the norwegian flag in an HTML page in the smallest space possible. This ended up being a competition to see who could molest and optimize GIF images the most, while browsers still were able to display them. From this experience I had a quite good knowledge of how GIF files are built internally, and I were able to do a good guess of what could be the actual issue in the resulting file.

Since GIF files can be animated, a single file may contain several “images” (which would be the frames in the animation). These images can have their own size and position within the “larger image”:

 _________
| im1     |
|    _____|
|   |     |
|   | im2 |
|___|_____|

im1 may then represent the first image and im2 the second image in the file. The second image will only update the area that it covers, and this will leave the rest of the image “as it is”. Since a GIF image may contain a large number of these images, a “global” size is defined for the image. This global size covers all the images, and is the total area that these images will be drawn into. If an image is drawn outside of this area (in part or whole), it will be clipped against the viewport.

This should provide enough background to at least give a general feeling about what COULD be the problem here, but to actually find out what’s happening, we’ll dig into a GIF file format specification and the file that was created. This simple reference provides a general layout of the GIF file, and we’ll use that to take a look at what values the file we ended up with had:

On the left we have the actual byte values in hex and on the right we have the corresponding ASCII character represented by that value. As you can see, the first six bytes of the file (0x47 0x49 0x46 0x38 0x39 0x61 (0x is the general way of prefixing numbers that should be interpreted as hexadecimal)) corresponds to “GIF89a” (You can do this exercise yourself armed with this Ascii Table. Simply look up 47 in the Hx column, then 49, etc). Those six bytes are what we call the signature of a GIF file (although the number can be different, i.e. GIF87a, depending on the version used).

The next fields in the specification reads:

Offset   Length   Contents
  6      2 bytes  
  8      2 bytes  

So byte 6-7 and byte 8-9 should tell us the logical size of the whole gif file (which the images will be drawn onto). In our test file here, that’s represented as:

Width: 0x67 0x01
Height: 0x70 0x00

The byte order here is Little Endian, which means that the least important values are placed first. Since we have two bytes for each value, we can calculate the decimal value of the width by multiplying:

0x67 0x01 = 6 * 16 + 7 + (0 * 16 + 1) * 256 = 359
                                        ^-- Since we're in the next byte, we multiply with 256.

You can also do this with the windows calculator, by entering 167 while being in hexmode, then selecting dec (for decimal). The reason for multiplying the second byte with 256 is that this byte provides the value of the “next 8 bits”, while the first provided the value for the first 8 bits. If we see the bits themselves:

0x70 | 0x01: 0111 0000 | 0000 0001

Little Endian says that the least significant bits come first, so to get the raw bit values, we turn it around:

0000 0001 0111 0000

As you can see, the value of the second byte (0x01) can be multiplied with 256 (which is the last 8 bits).

We can also calculate the height:

0x70 0x00 = 7 * 16 + 0 + (0 * 16 + 0) * 256 = 112
                          ^-- both numbers in the second byte is zero

Alas, the global header of the GIF image that were generated says that the size of the image is 359×112, which is why the image is rendered larger than it should have been. We then take a look at the Image section of the GIF file (all GIF files should contain at least one), which is defined as:

Offset   Length   Contents
  0      1 byte   Image Separator (0x2c)
  1      2 bytes  Image Left Position
  3      2 bytes  Image Top Position
  5      2 bytes  Image Width
  7      2 bytes  Image Height

Armed with this information, we examine the area where the image section starts:

The start of the Image section is the “Image Separator”, a byte value of 0x2c, shown highlighted in the image above. This is where the image section starts, and the offsets in the table is relative to this location. The next four bytes tells us where in the global viewport the upper left corner of this image should be drawn. The values here are 0x01 0x00 twice, simply meaning (1,1), or one pixel down and out from the upper left corner (which is also related to the issue posted by Christer, but we ignore that one here now). The next values are however those we are interested in, which provides Image Width and Image Height:

Width:
0x73 0x00 = 7 * 16 + 3 + (0 * 16 + 0) * 256 = 115
Height:
0x6F 0x00 = 6 * 16 + 15 + (0 * 16 + 0) * 256 = 111

This means that the dimension of the image that’s actually supplied in the GIF file, is 115×111 pixels and should be drawn beginning one pixel down and one pixel out (as given by 0x01 0x00 in the x,y-fields above). Compare this to the reported global size of the image (359×112), and we can see where our transparent space is coming from. The browsers (and other image viewers) create a canvas the size of 359×112 pixels, while only drawing an image into the 115 leftmost pixels. The rest is left transparent, but they’re still there as the file says that’s the size of the viewport. If we manually change the size of the viewport to 0x74 0x00 in the GIF header itself, the image displays properly. To illustrate with another great ascii drawing:


               viewport
 _____________________________________
|           |                         |
|  actual   |                         |
|  image    |                         |
|  drawn    |                         |
|           |                         |
|           |                         |
|           |                         |
|___________|_________________________|

The solution to the problem here were to call the setImagePage method of the image object, as that allows us to set the values for the global image ourselves (and we know how wide the image were supposed to be).

Bonus knowledge: This issue did not occur when saving to a JPEG file, as JPEG files does not have the same capability of storing several subimages inside one file, and does not have the same rendering subsystem as GIF files. ImageMagick knows this, and does not use the page-values when rendering the file.

Hopefully this has provided a minor introduction into how files are structured, what you can learn armed with a hex editor and a file format specification and provided a few insights into what you can do when you’re faced with a very weird problem.

Fatal error: Exception thrown without a stack frame in Unknown on line 0

While Christer were updating his profile on one of our current projects, he suddenly got this rather cryptic message. We tossed a few ideas around, before just leaving it for the day. We came back to the issue earlier today, and I suddenly had the realization that it had to have something to do with the session handling. Changing our session handler from memcache to the regular file based cache did nothing, so then it had to be something in the code itself.

The only thing we have in our session is an object representing the currently logged in user, so it had to be something in regards to that. After a bit of debugging I found the culprit; a reference to an object which contained a reference to a PDO object. PDO objects cannot be serialized, and this exception was being thrown after the script had finished running. The session is committed to the session storage handler when the script terminates, and therefor is run out of the regular script context (and this is why you get “in Unknown on line 0”. It would be very helpful if the PHP engine had been able to provide at least the message from the exception, but that’s how it currently is.

Hopefully someone else will get an Eureka!-moment when reading this!

The solution we ended up with was to remove the references to the objects containing the unserializable structures. This was done by implementing the magic __sleep method, which returns a list of all the fields that should be kept when serializing the object (we miss the option of instead just unsetting the fields that needs to be unset and then let the object be serialized as it would have if the __sleep method wasn’t implemented). We solved this by implemeting the __sleep method and removing our references, before returning all the fields contained in the object:

public function __sleep()
{
    $this->manager = null;
 
    return array_keys(get_object_vars($this));
}

And there you have it, one solved problem!

DNS and Authorative Answers

Just had to debug a little issue with a domain that were not considered authorative from our DNS server. The SOA records were there as they should be, and it referred the correct name server in the SOA record. A bit of searching at Google later turned up the cause of the issue; the name server did not have an NS entry for the domain. To be considered authorative, a domain server has to provide an NS entry for the domain too. As I do DNS stuff only a couple of minutes each year, details like that happily slides away to the darkest corners of my mind.

So to sum it up: Get your SOA and your NS records straight, and you can be authorative all day long! Authorative like it’s 1999!

The Handwriting of Font Designers

En route from Slashdot comes an article with examples of the hand writing of several people that design fonts and typefaces for a living. They’re obviously human too, but I actually think that all the provided examples of their hand writing are beautiful in their own way – and should be turned into typefaces by themselves. In particular I’d like to point out Marian Bantjes and Sebastian Lester, but that might be because I’m very fond of typefaces with large decents (read more on Wikipedia’s article about typefaces).

Google Releases Their Protocol Buffers

Fresh from the Google Open Source Blog comes news that Google has released their Protocol Buffers specification and accompanying libraries. The code and specification has been release at Protocol Buffers on Google Code.

Protocol Buffers is a data format for fast exchange and parsing of data and messages between computers. It is similar to simple uses of XML in this manner, but the messages size on the wire and their parsing time is very much optimized for busy sites. There is no need to spend loads of time doing XML parsing when you instead could do something useful. It’s very easy to interact with the messages through the generated classes (for C++, Java and Python), and future versions of the same schema are compatible with old versions (as new fields are just ignored by older parsers).

Still no PHP implementation available, so guess it’s time to get going and lay down some code during the summer. Anyone up for the job?