The First Rule of Debugging

Seems like there are quite a lot of ideas about what the first rule of debugging are, and a quick Google Search gives you insightful suggestions such as:

These are all valuable insights that provide clear value about what and how you could attack the problem of the Bug That Wasn’t Supposed To Be There (there are surely bugs that actually were supposed to be there, because of invalid domain specifications, etc.). I did however find one small column from DDJ that led into the same thing I’m going to write now: “Before you go to fix it, be sure you’re fixing the right thing.

I will assume that you already have uncovered that you actually have a bug — you’re not getting the results you expected, and something is to blame. You start out by trying to isolate the problem, and try to recreate the broken situation with different inputs. You try it on another system to see if there’s something about the configuration, about the dataset, about the versions of your library or something, which we’ll disover later, is completely irrelevant.

I obviously forgot this rule at work today, and that’s why I’m writing this post now. Remember kids. Always make sure you’re fixing the right thing! I’ve started porting one of the front end services we use to federate searches from a borked Java library to a PHP-based implementation instead. This involves talking to an internal search server that I hadn’t written code to interface with before, other than the existing search service. The implementation went from zero to usable in almost no time, and I now have code that provides a nice foundation for the remaining work. The problem is that I obviously decided to fix another issue with the search server that I had looked into earlier, and it seems that I thought that since I now had new and improved experience with the service, I’d be better off this time.

So I set out trying to get a new sorting scheme working for the search results. At first I failed, but then I stumbled across some documentation that threw me in the right direction. I did a few minor changes, and the sorting changed. Woohooo! Well, it almost worked. The order was not quite as expected, but the results seemed to be close enough for the proximity search to be working. It just had to be something in the documentation or implementation that I had missed! Further digging and experimentation for the greater part of an hour or so left me blank, but I managed to get another search ordering which made me think that there was something peculiar about my input parameters that created the issue.

After almost two and a half hour of this, I happened to read four lines that were neatly tucked away in a “process” part of the resulting XML document from the search service. In the diffuse light glooming from my Dell monitor, I could read the now obvious words: “<subsystem> No license.”.

Yep. It never had anything to do with the actual result. It was never even active. The change of the sorting must have happened for some other reason (such as removing the default values or anything like that). I weren’t supposed to debug why the sorting was incorrect, I was supposed to find out why the subsystem didn’t load.

Grrr.

Mats’ (borrowed) first rule of debugging: Make sure you debug the right problem. Do not make assumptions.

(and a rule of documentation: if the feature may not be available in the installation, write how you can check this before actually giving examples and detailing how to use the feature)

Debugging Missing Statistics in OpenAds (OpenX)

Our statistics in OpenAds had suddently gone missing in action, and I started suspecting a few errors we’d gotten earlier about fubar-ed MyISAM-tables. First, check out debug.log (or maintenance.log if you’re running a newer version than us) in the var-directory of your Openads-installation. The easiest thing to do here is to search for the string ’emergency’, which will be posted to the log each time something fails in MySQL. The MDB2 error message that is included will show you the error message from MySQL in one of the fields (about 15-25 lines down), which will give you the reason for the error (if MySQL is to blame).

Some tables had been marked as crashed in our MySQL-installation, so we had to find out what to fix. A quick run with myisamchk in the MySQL-data directory for the database gave us a few hints:

myisamchk *.MYI > /tmp/myisamcheckoutput

By redirecting the normal output you’ll just get the error messages to stderr (Openads has quite a few tables, so your console will fill up quite quick otherwise) (as stdout will be redirected to /tmp/myisamcheckoutput). You’ll also be able to check the output by using less on /tmp/myisamcheckoutput.

If any tables are having problems, you can run:

REPAIR TABLE ;

in your MySQL console, and the table should be repaired in the background. After doing this, it’s time to get maintenance back up and running again.

Run the maintenance.php file manually (or wait until it gets triggered within the next hour):

php /scripts/maintenance/maintenance.php 

Christer Goes Bug Hunting

Christer had the pleasure of hunting down a bug in Zend Framework a couple of days ago, and he has just posted a nice article about the bug in Zend MVC Components and how he debugged it. If you’ve never used a debugger before, this article is probably also going to be a bit helpful, and it gives a little insight into how the Zend Framework MVC-components work.