Mats – Page 20 – Mats Lindh

Informa and getCategories Truncates Title at “/”

I stumbled across a weird issue in Informa and the ItemIF.getCategories method today. The categories we retrieve are separated with / to indicate their full hierarchy, but Informa only gave me the first part of the category (just “Properties” of “Properties / Houses”). The solution to this is to explicitly access the category element of the object itself:

String categoryDomain = item.getAttributeValue("category", "domain");
String categoryTitle = item.getElementValue("category");

This should be extended to support several category-elements, but as we only get one in our feeds, this solved the problem for us.

Transparent Remapping of / For Struts Actions

I’ve been trying to find a solution to this issue for a couple of hours: We have several struts actions in our Java-based webapp, all neatly mapped through different .do action handlers. I wanted to switch our handling of the root / index URL (http://www.example.com/) from being a static redirect to the actual action to instead present the content right there, without the useless redirect. This apparently proved to be harder than I thought; after searching a lot through Google, reading through mailing lists and the official documentation, it seems that there is no way to specify an action to handle these requests by default in struts. There may be one, but I could not for the life of ${deity} find it.

As simply doing it in Struts were out of the question, I turned to an old friend of mine, the ever so helpful mod_rewrite. mod_rewrite is capable of rewriting URLs internally in Apache before they get handled at other levels. The problem was that mod_jk seemed to grab the request before the replacements were made, but a few resources pointed me in the correct direction:

After a bit of debugging with the rewritelog, everything came together. This is how it ended up:

       	RewriteEngine On
       	RewriteLog /tmp/rewrite.log
       	RewriteLogLevel 3
       	RewriteRule ^/?$ /destinationfile [PT,NE]

The RewriteLog say that we should log the rewrite progress to /tmp/rewrite.log, and RewriteLogLevel say that we should log at the most detailed format. Use 0, 1, 2 for less debugging information. Remember to comment out these lines when things work. DO NOT use the RewriteLog when not actually debugging the rewrites.

Keeping Your Code in Check: Part 1 – DRY

DRY – Don’t Repeat Yourself – is a central principle for everyone who wants to keep their code maintainable and in a clean and pristine state. It will save you from tedious tasks in the future, although it requires you to do a bit more thinking up front. The clue is to recognize when you’re about to commit the first programming sin: repeating parts of code.

Lets first be completely clear: not repeating code does not mean your code should be as general as possible. Doing that will only end up in a horrible mess that your only hope of selling is to label it “Enterprise” and push all the configuration options into XML. If your way to success is to craft a new buzzword and get the consulting business to sell your product, that might be how you do it. If you’re interesting in writing maintainable and solid software that other people can play around with and get the grip of quite quickly; don’t.

Keep it specialized to do what you need it do easily, but keep it general enough to allow you to reuse it for similar tasks.

The reason why I’m posting this now is unsurprisingly enough that I ran into this issue while coding a library at work. This issue is so common that any programmer will run into it several times a day, but the issue at hand today illustrates the problem perfectly. In a class that I was writing to return a set of data to the user, I decided to add three methods for sorting the data in different manners before returning it. The problem is that the data is kept in different array keys, and it’s the arrays themselves I want to sort. Example:

['displayValue' => 'Aloha Beach', 'count' => 13, 'value' => 'aloha']
['displayValue' => 'Norway', count => 3, 'value' => 'norway']
['displayValue' => 'Sweden Swapparoo', count => 7, 'value' => 'sweden']

To sort this in PHP, you’d use the usort function with a callback function / method that sorts the arrays by the value of the right key. My first version was something like:

private function sortByValue($a, $b)
{
    if (!isset($a['value'], $b['value']))
    {
        if (isset($a['value']))
        {
            return -1;
        }

        if (isset($b['value']))
        {
            return 1;
        }
    
        return 0;
    }

    return strcmp($a['value'], $b['value']);
}

private function sortByCount($a, $b)
{
    if (!isset($a['count'], $b['count']))
    {
        if (isset($a['count']))
        {
            return -1;
        }

        if (isset($b['count']))
        {
            return 1;
        }
    
        return 0;
    }

    return ($b['count'] - $a['count']);
}

As I typed the second if (isset()) in the second method, I finally realized that I were sitting at work and writing the exact same function twice, with just a little twist between them in regards to which field to sort by – and how to determine the sort. One field was numeric, the other was a string. Our goal is to re-use as much as the common code from both methods without typing it up — or much more important, maintaining — it twice.

As you can see, almost the whole function is identicial, except for the key name (‘value’ vs ‘count’) and how the comparison is done (numeric vs string). We need to handle these two issues to be able to use the same function for both purposes, so we rework the two functions into one function for general use for sorting on the value of a key:

private function sortByArrayField($a, $b, $field)
{
    if (!isset($a[$field], $b[$field]))
    {
        if (isset($a[$field]))
        {
            return -1;
        }

        if (isset($b[$field]))
        {
            return 1;
        }
    
        return 0;
    }

    if (is_numeric($a[$field]) && is_numeric($b[$field]))
    {
        return ($b[$field] - $a[$field]);
    }

    return strcmp($a[$field], $b[$field]);
}

This way we handle both the comparison method (if both values are numeric, we do the comparison as a numeric value by simply subtracting the first from the second) and the field to sort (as a third parameter).

Sadly usort does not provide that third parameter for us automagically, but by creating a few simple helper methods for using our refactored function, we get all “configuration” set in those three methods. The code base only contains one implementation of the method itself, but several ways to use it. Example of these three helper methods:

private function sortByValue($a, $b)
{
    return $this->sortByArrayField($a, $b, 'value');
}

private function sortByCount($a, $b)
{
    return $this->sortByArrayField($a, $b, 'count');
}

private function sortByDisplayValue($a, $b)
{
    return $this->sortByArrayField($a, $b, 'displayValue');
}

If someone now comes along and decides to add another column to our array, such as ‘price’, we can simply add another sorting callback to accomodate this:

private function sortByPrice($a, $b)
{
    return $this->sortByArrayField($a, $b, 'price');
}

The sortByArrayField method is already well proven and tried from our previous usage, and by simply changing the field we’re sorting by, we still get the power of the callback, get a completely new sort criteria and the maintainability of just one method.

The First Rule of Debugging

Seems like there are quite a lot of ideas about what the first rule of debugging are, and a quick Google Search gives you insightful suggestions such as:

These are all valuable insights that provide clear value about what and how you could attack the problem of the Bug That Wasn’t Supposed To Be There (there are surely bugs that actually were supposed to be there, because of invalid domain specifications, etc.). I did however find one small column from DDJ that led into the same thing I’m going to write now: “Before you go to fix it, be sure you’re fixing the right thing.”

I will assume that you already have uncovered that you actually have a bug — you’re not getting the results you expected, and something is to blame. You start out by trying to isolate the problem, and try to recreate the broken situation with different inputs. You try it on another system to see if there’s something about the configuration, about the dataset, about the versions of your library or something, which we’ll disover later, is completely irrelevant.

I obviously forgot this rule at work today, and that’s why I’m writing this post now. Remember kids. Always make sure you’re fixing the right thing! I’ve started porting one of the front end services we use to federate searches from a borked Java library to a PHP-based implementation instead. This involves talking to an internal search server that I hadn’t written code to interface with before, other than the existing search service. The implementation went from zero to usable in almost no time, and I now have code that provides a nice foundation for the remaining work. The problem is that I obviously decided to fix another issue with the search server that I had looked into earlier, and it seems that I thought that since I now had new and improved experience with the service, I’d be better off this time.

So I set out trying to get a new sorting scheme working for the search results. At first I failed, but then I stumbled across some documentation that threw me in the right direction. I did a few minor changes, and the sorting changed. Woohooo! Well, it almost worked. The order was not quite as expected, but the results seemed to be close enough for the proximity search to be working. It just had to be something in the documentation or implementation that I had missed! Further digging and experimentation for the greater part of an hour or so left me blank, but I managed to get another search ordering which made me think that there was something peculiar about my input parameters that created the issue.

After almost two and a half hour of this, I happened to read four lines that were neatly tucked away in a “process” part of the resulting XML document from the search service. In the diffuse light glooming from my Dell monitor, I could read the now obvious words: “<subsystem> No license.”.

Yep. It never had anything to do with the actual result. It was never even active. The change of the sorting must have happened for some other reason (such as removing the default values or anything like that). I weren’t supposed to debug why the sorting was incorrect, I was supposed to find out why the subsystem didn’t load.

Grrr.

Mats’ (borrowed) first rule of debugging: Make sure you debug the right problem. Do not make assumptions.

(and a rule of documentation: if the feature may not be available in the installation, write how you can check this before actually giving examples and detailing how to use the feature)

First Impression of the WordPress 2.7 Wireframes

I’ve done a few posts about WordPress Usability Issues before (and the original wordpress usability post..), documenting my experiences with starting to use WordPress from scratch. I still manage to publish loads of posts without tags and categories, so I’ve been eager to find out if there is anything in the new version of WordPress that will solve this issue for me.

Then a couple of days ago an article appeared on wordpress.org containing a collection of wireframes for how the layout in WordPress 2.7 is currently planned. Sadly enough, it seems to place the tags and categories part of the layout to a location which is even harder to notice: Below the items in the sidebar at the right. I just discovered that I have a “Shortcuts” and “Related” menu items there, so I’m quite sure that this will be a place that’ll leave me missing the tags and categories even more often than today.

If the placement is set in stone, I’d strongly suggest adding some kind of visual hint to the Save/Publish buttons that indicate that tags and categories are missing; at least if you’ve enabled an option in the settings part of wordpress. It seems as the boxes are draggable in the wireframes, but for fresh users, this will still be confusing.

I guess most people at least want to add a category for their posts, but the wordpress guys probably have a lot of usage statistics on this from wordpress.com. It’d be very interesting to see any change in the usage of tags and categories between the different layouts, and the number of edits that simply add tags and categories within the first 24 hours after a post has been published.

(and yey! I remembered tags/categories on this one.)

rinetd saves the day (.. or night, more properly)

During thursday we moved our mail server and domain controller from our Oslo offices to the Fredrikstad office. As most of our staff now resides in Fredrikstad, this was a logical thing to do to keep the resources close to the people. The move went as planned, and all the servers and services started up as they should. There was one slight problem, as we’re accepting new mail through the link to our Oslo office (as that’s were the mailserver used to live), and our firewall routes the SMTP connections to our internal mailserver. We have an external mail relay that does mail filtering, before delivering the mail to our server. This means that we only accept SMTP connections from that particular host, as we’re not the general recipent (as we’re not the MX-record) for our own domain. This is handled in the Exchange configuration, where the allowed IP-s are set up.

Anyways, the connection between the two offices are handled internally by using a IPsec link, which routes encrypted traffic between the two networks. This has worked out just like it should, and we’ve been very happy with the setup. After moving the mail server, our plan was to simply route the connections that are coming in to our SMTP server over the VPN link to the other office, and then deliver the mail to the new address of the mail server. This proved to be more troublesome than expected. As the external filter service is run by an external company, we were unable to contact them to get them to change the address on short notice. After a couple of hours trying to get the packets to flow the right way, we decided to use an old friend of ours to help out instead; rinetd.

rinetd is a very, very simple daemon, which simply accepts a connection on a port and redirects that port on the socket layer to another host (or port). We did however need one other feature of rinetd; access control. As I mentioned earlier, we’re only interested in allowing certain hosts to talk to our SMTP server. Since the connections now are routed through rinetd instead, the only host Exchange ever will se delivering E-mail, is the rinetd server. This means that we had to implement the access control in rinetd instead, filtering on the external IPs connecting to our server.

rinetd allows these settings to be provided globally, so that you are able to allow or deny certain hosts for all services using rinetd. This would have worked out nice, if it hadn’t been for the fact that we also run our web mail through exchange on the same host. So we also need to forward the SSL connection for the webmail, and we have to allow all hosts to connect to the webmail. Luckily, rinetd also makes it possible to solve this, as you’re able to provide access rules for each of the forwarded ports separately.

There is however one caveat; the global rules will always be evaluated first. If you have certain global rules that deny or allow traffic (if you have one allow rule, all other traffic will be denied, unless explicitly allowed), the connection might never reach the evaluation of the “local” rules for each forward. This means that if one of your connections should allow people to connect from anywhere (such as our ssl webmail), your global settings have to allow all clients to connect.

We therefor moved our limitations for the SMTP server to that section instead, and ended up with this access control configuration:

# bindadress    bindport  connectaddress  connectport
# SMTP
<ip1> 25 <ip2> 25
allow 192.168.0.*
allow 192.168.1.*
allow <external ip>
allow <external ip>
allow <external ip>

# SSL
<ip1> 443 <ip2> 443
allow *

And after that, everything worked perfectly until we got our mail filter provider to switch to the new address of the SMTP server. rinetd saves us that night, and not a single one of our staff noticed the difference. If they don’t know we’re there, we’re doing something right.

Auto Slalom Training with PCN and The Subaru Impreza Club

Marius and I went auto slaloming last sunday as we’ve been doing several weekends this year, and as usual I shot a collection of photos for my flickr page. And as what has now become a tradition, I’ll post to the blog with my selection of the best shots and the videos we shot. You can see the complete photoshoot over at flickr.

The next image was shot with the camera attached to the back of the car, running at auto shoot with a fresh memory card inside. The camera was locked to the “GO GO GO!” mode with an external switch, and I triggered the first image as I got into the car. The effect was quite good, looking forward to doing a few more shots this way with more interesting exposure times.

The final shot shows the camera setup for the outboard video, which can be seen in the first video clip below. The position is quite shaky as the camera is attached to the very front of the car, but it’s a dramatic angle. I’ll see if we can do something about those vibrations for later.

The best run of the day:

A few mishaps which we caught on tape:

This is probably the last run of the season, so I guess I’ll be back with more auto slalom and racing related posts as soon as the ice and snow disappears in april. Looking forward to it already!

Once Again Loved by the Spambots

After logging in to wordpress tonight I got the happy message:

There are currently 12 comments identified as spam.

\o/

.. for those who are wondering if I’ve finally lost it, this may provide some insight.

Post-commit hook failed with error output: .. and nothing more.

While getting the trac-svn integration up and running in one of our repositories tonight, I stumbled across this issue. Most links I found on Google said to try to run the post-commit script by itself under the same user, which worked just fine. Checked that all my paths were absolute paths to avoid assuming any CWD, but still nothing. After trying to minimize the problem I discovered that even with a post-commit-script of just environmental assignments, things were still failing. This nabble archived thread did however give a very important hint by Ryan Schmidt (who obviously has solved the post-commit problems for every single developer out there, if I were to judge by the number of times his name comes up. Awesome work, Ryan.):

Your post-commit must also begin with a line like “#!/bin/sh” and have its executable bit set.

Wow. Homer Simpson “Doh!”-moment right there. No #!/bin/bash (or #!/bin/sh) ment that SVN weren’t able to run the script with the proper interpreter. When running it from the command line, bash were already running of course, so it just assumed that it was a bash script .. and it were right.

Thanks Ryan, saved me a couple of hours tonight.

Informa and Custom XML Namespaces in RSS

While integrating a custom search application into a Java-based web application, I came across the need to access properties in custom namespaces through the Informa RSS library. Or to put it in another way; i needed to access to properties, Informa had been used for RSS parsing in the previous versions of the web application. The people who developed the original version of the application had decided to extend the Informa library into their own version, and had added several methods for .get<NameOfCustomProperty> etc. After thinking about this for approximately 2 seconds, I decided that having to support and modify a custom version of Informa was not the right track for us.

My initial thought was that their decision to customize Informa to support these methods had to come from the idea that Informa did not support custom namespaces out of the box. I did a few searchas over at Google, and found nothing useful. Reading through the documentation for Informa didn’t do me any good either, so I tried to find an alternative library instead. Did a bit of searching here too, and stumbled across a hit for one of the util classes for Informa (.. again). This did support custom namespaces, so the backend support was there at least. Then it struck me while reading the documentation for Informa and ChannelIF again; Informa did support it, as it inherited the methods from further up in the hierarchy. The getElementValue and getElementValues methods of the ChannelIF and ItemIF classes allows you to fetch the contents of elements with custom namespaces in a very easy to like manner.

System.out.println(item.getElementValue("exampleNS:field"));

This simply returns the string contained between <exampleNS:field> and </exampleNS:field>

Hoooray! We now have support for these additional fields, and we do not have to keep Informa manually in sync with the version in our application. Why the original developers decided to fork the Informa library to add their own properties I may never know, but I’ll update this post if they decide to step forward!