UTF-8 and Putty

After making the move to UTF-8 for all applications in my daily routine a couple of weeks ago, putty was the last client to cause any sort of problem. I logged into my home server to read E-mail today, and putty apparently got everything mixed up as it tried parsing the UTF-8 encoded text as ISO-8859-1.

There is however a quick and easy solution in the more recent versions of putty (.. only a couple of years old?):


Menu -> Change Settings -> Window -> Translation -> Received data assumbed to be in character set

Set that value to UTF-8, and everything will be A-OK!

New Adventures in Reverse Engineering

Before I go into the gory details of this post, I’ll start by saying that this method is probably not the right solution for you. This is not something you want to do if you have readily access to any source code or if you have an existing relationship with the 3rd party that provided the library you’re using. Do not do this. This is not for you.

With that out of the mind, this is the part for those who actually are interested in getting down and dirty with Java, and maybe solving a problem that’s hard to solve otherwise.

The setting: We have a library for interfacing with another internal web service, where the library was provided in binary form by a 3rd party as part of the agreement when the service were delivered to us. The problem is that due to some unknown matter, this library is perfectly capable of understanding UTF-8, both as input from us and as input from the web service, but all web related methods in the result class returns data encoded as ISO-8859-1. The original solution to this was to keep two different parts of the query string — the original query in one particular key — and the key for the library in ISO-8859-1. This needs loads of special casing, manually handling that single parameter, etc. This works to a certain degree as long as the library is the only component in the mix. The troubles really began to surface when we started querying other services based on the same query. We’d then have to special case all methods that were used in URLs, as they returned ISO-8859-1 — and all other libraries and encodings are using UTF-8.

The library has since been made into a separate product with a hefty price tag, so upgrading the library was not an acceptable solution for us. Another solution had to be found, and this is were things starts to get interesting.

Writing a proxy class to handle the encoding issue transparently

This was the solution we attempted first, but this requires us to implement quite a few methods, to add additional code to the method that provides access to the library and to extend and embrace parts of the object. This could have been done quite easily by simply changing one method of the class to reference super.methodName() and then returning that result, but as we have to change several classes (these objects live 3-4 levels down into the result object from the library) which add both developer and runtime overhead. Not good.

Decompiling the library

The next step was to decompile the library to see how the code of the library actually worked. This proved to be a good way to find out how we could possibly solve the issue. We could try to fix the issue in the code and then recompile the library, but some of the class files were too new for jad to decompile them completely. The decompilation did however show the problem with the code:

    if (encoding != null)
    {
        return encoding.toString() :
    }

    return "ISO-8859-1";

This was neatly located in a helper method that ran on every property used when generating a query string. The encoding variable is retrieved from a global settings object, only accessible in the same library. This object is empty in our version of the library, so not much help there. But here’s the little detail that leads into the next part, and actually made this hack possible: “ISO-8859-1” is constant. This means that it gets neatly tucked away as an UTF-8 string when the class file is generated. Let’s gets down and dirty.

Binary patching the encoding in the class file

We’ll start by taking a look at the hexdump in our class file, after searching for the string “ISO” in the ASCII representation (“ISO” in UTF-8 is identical to the ASCII representation):

Binary Patching a Java Class

I’ve highlighted the interesting part where “ISO-8859-1” is stored in the file. This is where we want to do our surgical incision and make the method return the string “UTF-8” instead. There is one important thing you should be aware of if you’ve never done any hex editing of files before, and that is the fact that the byte offset of parts of the file may be very important. Sadly, the strings “UTF-8” and “ISO-8859-1” have different lengths, and as such, would require us to either delete bytes following “UTF-8” or put spaces there instead (“UTF-8     “). The first method might leave the rest of the file skewed, the latter might not work if the method used for encoding the value doesn’t trim the string first.

To solve this issue, we turn to our good friend VM Spec: The class File Format, which contains all the details of how the class file format is designed. Interesting parts:

In the ClassFile structure:

cp_info constant_pool[constant_pool_count-1];

As we’re looking at a constant, this is where it should be stored. The cp_info is defined as:

cp_info {
    u1 tag;
    u1 info[];
}

The tag contains the type of constant, the info[] array varies depending on the type of the constant. If we take a look at the table in Chapter 4.4, we see that the the identifier for a unicode string is:

CONSTANT_Utf8 	1

So we should have the value of 1 in the byte (as the actual value, not the ascii character) describing this constant. If the value is one, the complete structure is:

    CONSTANT_Utf8_info {
    	u1 tag;
    	u2 length;
    	u1 bytes[length];
    }

The tag should be 1 as the byte value, the length should be two bytes describing the length of the actual string saved (since we’re storing the length in two bytes (u2), it can be a maximum of 2^16 bytes in total). After that identifier, we should have length number of bytes with UTF-8 data.

If we go back to our hex dump, we can now make more sense of the data we’re seeing:

binarypatchingclass-length

The byte shown as 0x01 in hex is the value 1 for the tag of the structure. The 0x00 0x0A is the two bytes making up the length of the string:

    0000 0000 0000 1010 binary = 10 decimal

    ISO-8859-1
    1234567890 

This shows that the length of our string “ISO-8859-1” is 10 bytes in UTF-8, which is the same value that is stored in the two bytes showing the length of the string in the structure.

Heading back to our original goal: changing the length of the string stored. We change the string bytes to “UTF-8”, which is five bytes. We then change the stored length of the string:

    00 0A becomes
    00 05

We save our changes and re-create the jar file again, with all the previous classes and our changed one.

After inserting our new JAR-file into our maven repository as a new build and updating our local repository, we now have complete UTF-8 support from start to finish. Yey!

ImportError: No module named trac.web.modpython_frontend

One of the reasons why you might get the error:

ImportError: No module named trac.web.modpython_frontend

after installing Trac is because of the fact that apache may not be able to create the Python egg cache, which is detailed in the Trac wiki right here. This will also generate the above error if not set up correctly. Create a directory for the files, change the owner to www-data.www-data (or something else, depending on which user you run Trac under) and rejoice.

The settings needed in the vhost configuration (.. or wherever you have your configuration ..):

    
        SetHandler mod_python
        PythonInterpreter main_interpreter
        PythonHandler trac.web.modpython_frontend
        PythonOption TracEnvParentDir /path/to/trac
        PythonOption TracUriRoot /
        PythonOption PYTHON_EGG_CACHE /path/to/directory/you/created
    

You can easily do a quick test by setting the path to /tmp and checking if that solved your problem. If it did, create a dedicated directory and live happily ever after. If it didn’t, continue your quest. Check for genshi and other dependencies. Do a search on Google ™.

Hopefully everything works again.

BTW: Another reason for this error might be that your trac installation may no longer be available (if your installation uses a version number in the library path and you upgraded the python version, this path will change – and your old libraries may not have been copied over), so it might help reinstalling Trac in your new environment:

easy_install -U Trac

.. and then try again (thanks to Christer for reporting on this after he had the same problem).

PDO and PDO::PARAM_INT

Hi there Mr. PDO!

We’ve come to know each other, and yes, while you have your troubles (.. which I don’t, of course), I’ve accepted your short comings. Today you threw another one of your fits, but I’ll be sure to document it for the world to see.

$statement = $pdo->prepare("
    ...
    LIMIT
        :offset, :hits
");

Yep. This will of course fail if you’re binding strings. ’10’, ’10’ is not very helpful now, is it. Good point. So let’s tell PDO that we’re really binding ints:

$statement->bindValue(':offset', $offset, PDO::PARAM_INT);
$statement->bindValue(':hits', $hits, PDO::PARAM_INT);

But wait. You’re still complaining?! I told you they were ints?! What’s the problem now?!?!

Well. Mr. PDO requires you to also convert the values for him. So first you have to convert the values of a loosely typed language to a strong type, then you have to tell the library that yes, this is in fact another type than what the library obviously assumes that it is. This works:

$statement->bindValue(':offset', (int) $offset, PDO::PARAM_INT);
$statement->bindValue(':hits', (int) $hits, PDO::PARAM_INT);

Which means the following:

If the type of your variable internally is a string, it’ll be escaped as a string, even if you tell PDO that it should be handled as an INT in your database layer.

If the type of your variable is an int, it’ll be handled as a string, unless you tell PDO it is an int.

Something is backwards here.

Getting ÆØÅ to Work in mutt / putty

After reinstalling the server (see the previous post), mutt didn’t show the norwegian letters ÆØÅ properly any longer (.. and yes, I use mutt to read my E-mails. Nothing else comes close.) .. The issue was apparently related to the settings for the current locale, but a quick check showed things to be perfectly valid (.. although not UTF-8, but that’s another issue):

mats@computer:~$ locale
LANG=nb_NO.iso88591
LC_CTYPE="nb_NO.iso88591"
LC_NUMERIC="nb_NO.iso88591"
LC_TIME="nb_NO.iso88591"
LC_COLLATE="nb_NO.iso88591"
LC_MONETARY="nb_NO.iso88591"
LC_MESSAGES="nb_NO.iso88591"
LC_PAPER="nb_NO.iso88591"
LC_NAME="nb_NO.iso88591"
LC_ADDRESS="nb_NO.iso88591"
LC_TELEPHONE="nb_NO.iso88591"
LC_MEASUREMENT="nb_NO.iso88591"
LC_IDENTIFICATION="nb_NO.iso88591"

Why didn’t mutt show the proper letters then? Everything seems to be OK .. Instead, it just kept showing “?” where either of ÆØÅ should be.

Well, the settings are one thing, but if the locale itself isn’t available, things ain’t gonna be any better. So let’s fix that:

apt-get install locale-all

And .. well, at least we have the locale available now, but before we can use it, we need to generate the binary version. Find /etc/locale.gen and open the file in a suitable editor.

Find the line for the locale you’re using and uncomment it:

# nb_NO ISO-8859-1
# nb_NO.UTF-8 UTF-8

becomes:

nb_NO ISO-8859-1
nb_NO.UTF-8 UTF-8

Then run ‘locale-gen’ as root. Wait a few seconds and the locales will be generated. Run mutt. Be happy.

Back From Some Semi-Unscheduled Downtime

The blog (and everything else hosted on this server) was down for a total of 8 hours tonight. We started seeing disk read errors in the kernel log on friday, and spent the weekend backing up and saving configuration files that weren’t already in the off site backup. My host (NGZ) responded quickly and told me to just give them a hint when the server was ready for the disk change.

We decided to stop postfix from accepting email last sunday night, after which I ran a complete rsync and dumped the contents of MySQL and other services. We’ve just started up postfix again, and everything seems to be working as it should (.. after remembering to start spamd and install procmail again…).

Oh well. Back from the dead to haunt you yet again!

We’ll have a few minutes downtime in a couple of days when we remove the troubled disk again, but until then, stay happy!

A Quick INSERT INTO Trick

From time to time you’re going to need to move some data from one table into another, in particular to generalize or specialize code. I’ve seen amazingly large code blocks written to handle simple cases as this, and here’s a simple trick you’re going to need some day:

INSERT INTO 
    
    (field_1, field_2) 
SELECT
    some_field, some_other_field 
FROM 
    

You can also add conditionals, i.e. to retrieve rows that might have been inserted yesterday, etc:

INSERT INTO 
    
(field_1, field_2) SELECT some_field, some_other_field FROM WHERE condition = 1

The good thing about this is that all data movement is kept within the database server, so that the data doesn’t have to travel from the server, to the client and then back from the client to the server again. It’s blazingly fast compared to other methods. You can also do transformations with regular SQL functions in your SELECT statement, which should help you do very simple operations at the speed of light.

A Plugin for Paginating in Smarty

First I’d like to apologize for the lack of updates here in the last weeks, but the days have been very busy. I’ve bought a new car (more details about that as soon as the snow disappears), written a complete publishing platform from scratch in a weekend to help out when Gamer.no got in trouble and in general done a load of stuff. Anyways, this post isn’t about all that, but rather something else I wrote some time ago.

A use case you’ll encounter very often is the act of paginating items, i.e. including a simple “jump to page x, jump to the next page, jump to the previous page” footer. If you’ve ever tried to implement the logic around this in your view, you know that it can get quite extensive. You have several other solutions, such as the PEAR_Pager, which actually looks like a good solution now (with 2.x). Anyways, this is a plugin for Smarty to make generating pagination links easier.

Download the plugin, drop it into your plugins/ folder in your Smarty library directory, and voilá, you have access to the new {paginator} element.

The module is quite configurable, but as I’ve only extended the parts I’ve had use for it our projects, it may still lack a few simple keys.

{paginator hits=$hits offset=$offset total_hits=$articlesFound class_inactive=paginatorInactive class_active=paginatorActive}

The template variables used here are hits, the number of hits shown on this page, offset, the offset from 0 and total_hits, the total number of available hits in the current list. By default the plugin appends ?hits=<hits>&offset=<offset>+<hits> to the current URL, you can give another URL through the url attribute. The class_ attributes provide the CSS classes to use for the elements that enclose the page numbers or links. See the source code (!) for more information about attributes which work.

As I mentioned previously, the plugin is written for my own personal use, so it’s not as streamlined as it could be. Feel free to update it, dissect it, break it, claim it’s yours .. or anything. I’d be happy if you submit any patches to me so I can update the link here, or simply leave a comment.

Hack aways! You can see the plugin in action at the bottom of lovethatfun.com.