The First Rule of Debugging

Seems like there are quite a lot of ideas about what the first rule of debugging are, and a quick Google Search gives you insightful suggestions such as:

These are all valuable insights that provide clear value about what and how you could attack the problem of the Bug That Wasn’t Supposed To Be There (there are surely bugs that actually were supposed to be there, because of invalid domain specifications, etc.). I did however find one small column from DDJ that led into the same thing I’m going to write now: “Before you go to fix it, be sure you’re fixing the right thing.

I will assume that you already have uncovered that you actually have a bug — you’re not getting the results you expected, and something is to blame. You start out by trying to isolate the problem, and try to recreate the broken situation with different inputs. You try it on another system to see if there’s something about the configuration, about the dataset, about the versions of your library or something, which we’ll disover later, is completely irrelevant.

I obviously forgot this rule at work today, and that’s why I’m writing this post now. Remember kids. Always make sure you’re fixing the right thing! I’ve started porting one of the front end services we use to federate searches from a borked Java library to a PHP-based implementation instead. This involves talking to an internal search server that I hadn’t written code to interface with before, other than the existing search service. The implementation went from zero to usable in almost no time, and I now have code that provides a nice foundation for the remaining work. The problem is that I obviously decided to fix another issue with the search server that I had looked into earlier, and it seems that I thought that since I now had new and improved experience with the service, I’d be better off this time.

So I set out trying to get a new sorting scheme working for the search results. At first I failed, but then I stumbled across some documentation that threw me in the right direction. I did a few minor changes, and the sorting changed. Woohooo! Well, it almost worked. The order was not quite as expected, but the results seemed to be close enough for the proximity search to be working. It just had to be something in the documentation or implementation that I had missed! Further digging and experimentation for the greater part of an hour or so left me blank, but I managed to get another search ordering which made me think that there was something peculiar about my input parameters that created the issue.

After almost two and a half hour of this, I happened to read four lines that were neatly tucked away in a “process” part of the resulting XML document from the search service. In the diffuse light glooming from my Dell monitor, I could read the now obvious words: “<subsystem> No license.”.

Yep. It never had anything to do with the actual result. It was never even active. The change of the sorting must have happened for some other reason (such as removing the default values or anything like that). I weren’t supposed to debug why the sorting was incorrect, I was supposed to find out why the subsystem didn’t load.

Grrr.

Mats’ (borrowed) first rule of debugging: Make sure you debug the right problem. Do not make assumptions.

(and a rule of documentation: if the feature may not be available in the installation, write how you can check this before actually giving examples and detailing how to use the feature)

Leave a Reply

Your email address will not be published. Required fields are marked *