Mats – Page 10 – Mats Lindh

gnome-web-photo segfaults (segment fault)! OH NOES!

We capture images from beautiful web pages all over the world by exposing the gnome-web-photo package through a simple web service. After moving the service to a new server today gnome-web-photo suddenly started segfaulting (aka segment fault).

Running the application as the same user as the web server worked (after fixing the home directory so that gconf etc was able to create its files), but when running in the web server process itself things segfaulted.

The next attempt was to run both the working and non-working version through strace and see what the difference were, and apparently things segfaulted when the working process accessed <home directory>.mozilla/. This was the first access to anything inside the home directory of the user, which provided the solution:

When the process was running under the web server, the HOME environment variable was not set, but while running under the user from the shell (through su -), it was present. gnome-web-photo (or Firefox?) apparently does not feature any sort of fallback if the HOME environment variable is missing and segfaults instead.

Maybe that could be a patch for the weekend, but hey, the Olympic Games are on!

Fixing dpkg / apt-get Problem With Python2.6

While trying to upgrade to Python 2.6 on one of my development machines tonight I was faced by an error message after running apt-get install python2.6:

After unpacking 0B of additional disk space will be used.
Setting up python2.6-minimal (2.6.4-4) ...
Linking and byte-compiling packages for runtime python2.6...
pycentral: pycentral rtinstall: installed runtime python2.6 not found
pycentral rtinstall: installed runtime python2.6 not found
dpkg: error processing python2.6-minimal (--configure):
 subprocess post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of python2.6:
 python2.6 depends on python2.6-minimal (= 2.6.4-4); however:
  Package python2.6-minimal is not configured yet.
dpkg: error processing python2.6 (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 python2.6-minimal
 python2.6
E: Sub-process /usr/bin/dpkg returned an error code (1)

Attempting to install python2.6-minimal wouldn’t work, attempting to install python2.6 proved to have the same problem.

Luckily the Launchpad thread for python-central provided the answer: Upgrade python-central first!

:~# apt-get install python-central
[snip]
Setting up python2.6 (2.6.4-4) ...
Setting up python-central (0.6.14+nmu2) ...
:~#

Fixing Issue With PHPs SoapClient Overwriting Duplicate Attribute and Tag Names

The setting:

An SOAP request contains an Id attribute – and an element with the exact name in the response (directly beneath the element containing the attribute – an immediate child):


  foobar

The problem is that the generated result object from the SoapClient (at least of PHP 5.2.12) contains the attribute value, and not the element value. In our case we could ignore the z:Id attribute, as it was simply an Id to identify the element in the response (this might be something that ASP.NET or some other .NET component does).

Our solution is to subclass the internal SoapClient and handle the __doRequest method, stripping out the part of the request that gives the wrong value for the Id field:

class Provider_SoapClient extends SoapClient
{
    public function __doRequest($request, $location, $action, $version)
    {
        $result = parent::__doRequest($request, $location, $action, $version);
        $result = preg_replace('/ z:Id="i[0-9]+"/', '', $result);
        return $result;
    }
}

This removes the attribute from all the values (there is no danger that the string will be present in any other of the elements. If there is – be sure to adjust the regular expression). And voilá, it works!

Blogging Each Day For A Month – The Results Show

January has come to an end and in total I managed to blog each day except the 2nd and the 31st. I do not have a plan of continuing on that level, so I’ll probably slide back to the regular frequency of updates (1-2 a month) in the coming weeks. This spur of updates occured as I suddenly had inspiration to do five or six posts in an evening, actually making it possible to keep up the tempo for a couple of weeks. After a while things got a lot harder and I started to slide away from my regular posting time of 11am, but I got the posts out! Now it’s time to look at the stats for the previous month!

My blog is mainly search driven – I cover lots of one-off problems, attempting to include and descriptive error messages and other hints that people may use when they’re using Google to try to find an answer to a task they’re having problems with. This means that people don’t stay around to read other articles than the one they came here for, and so far this has meant that writing a new article usually has given me a small increment in traffic.

As December had the holidays – and searches for the terms I cover drop in large number during those days – I’m using the numbers from October 2009 for comparison. November is a day short.

	October	January
Visits	3 643	4 352	(+19%)
Pageviews	4 529	5 636	(+24%)

Time spent on the site increased with 7 seconds to 59 seconds – still nothing to write home about.

Avoid Escaping Spaces in the Query String in a Solr Query

Following up on the previous post about escaping values in a Solr query string, it’s important to note that you should not escape spaces in the query itself. The reason for this is that if you escape spaces in the query “foo bar”, the search will be performed on the term “foo bar” itself, and not with “foo” as one term and “bar” as the other. This will only return documents that has the string “foo bar” in sequence.

The solution is to either remove the space from the escape list in the previous function – and use another function for escaping values where you actually should escape the spaces – or break up the string into “escapable” parts.

The code included beneath performs the last task; it splits the string into different parts delimited by space and then escapes each part of the query by itself.

$queryParts = explode(' ', $this->getQuery());
$queryEscaped = array();

foreach($queryParts as $queryPart)
{
    $queryEscaped[] = self::escapeSolrValue($queryPart);
}

$queryEscaped = join(' ', $queryEscaped);

A Simple Smarty Modifier to Generate a Chart Through Google Chart API

After the longest title of my blog so far follows one of the shortest posts.

The function has two required parameters – the first one is provided automagically for you by smarty (it’s the value of the variable you’re applying the modifier to). This should be an array of objects containing the value you want to graph. The only required argument you have to provide to the modifier is the method to use for fetching the values for graphing.

Usage:
{$objects|googlechart:”getValue”}

This will dynamically load your plugin from the file modifier.googlechart.php in your Smarty plugins directory, or you can register the plugin manually by calling register_modifier on the template object after you’ve created it.

function smarty_modifier_googlechart($points, $method, $size = "600x200", $low = 0, $high = 0)
{
    $pointStr = '';
    $maxValue = 0;
    $minValue = INT_MAX;
    
    foreach($points as $point)
    {
        if ($point->$method() > $maxValue)
        {
            $maxValue = $point->$method();
        }

        if ($point->$method() < $minValue)
        {
            $minValue = $point->$method();
        }
    }

    if (!empty($high))
    {
        $maxValue = $high;
    }

    $scale = 100 / $maxValue;

    foreach($points as $point)
    {
        $pointStr .= (int) ($point->$method() * $scale) . ',';
    }

    $pointStr = substr($pointStr, 0, -1);

    // labels (5)
    $labels = array();

    $steps = 4;
    $interval = $maxValue / $steps;

    for($i = 0; $i < $steps; $i++)
    {
        $labels[] = (int) ($i * $interval);
    }

    $labels[] = (int) $maxValue;

    return 'http://chart.apis.google.com/chart?cht=lc&chd=t:' . $pointStr . '&chs=' . $size . '&chxt=y&chxl=0:|' . join('|', $labels);
}

The function does not support the short version of the Google Chart API Just Yet (tm) as it is an simple proof of concept hack made a few months ago.

How To Dismantle An Atomic HTTP Query .. String.

Following up on yesterday’s gripe about PHPs (old and now useless) automagic translation of dots in GET and POST parameters to underscores, today’s edition manipulates the query string in place instead of returning it as an array.

This is useful if you have a query string you want to pass on to another service, and for some reason the default behaviour in PHP will barf barf and barf. That might happen because of the dot translation issue or that some services (such as Solr) rely on a parameter name being repeatable (in PHP the second parameter value will overwrite the first).

function http_dismantle_query($queryString, $remove)
{
    $removeKeys = array();

    if (is_array($remove))
    {
        foreach($remove as $removeKey)
        {
            $removeKeys[$removeKey] = true;
        }
    }
    else
    {
        $removeKeys[$remove] = true;
    }

    $resultEntries = array();
    $segments = explode("&", $queryString);

    foreach($segments as $segment)
    {
        $parts = explode('=', $segment);

        $key = urldecode(array_shift($parts));

        if (!isset($removeKeys[$key]))
        {
            $resultEntries[] = $segment;
        }
    }

    return join('&', $resultEntries);
}

I’m not really sure what I’ll call the next function in this series, but there sure are loads of candidates out there.

Getting Dots to Work in PHP and GET / POST / COOKIE Variable Names

One of the oldest and ugliest relics of the register_globals era of PHP are the fact that all dots in request variable names gets replaced with “_”. If your variable was named “foo.bar”, PHP will serve it to you as “foo_bar”. You cannot turn this off, you cannot use extract() or parse_str() to avoid it and you’re mostly left out in the dark. Luckily the QUERY_STRING enviornment (in _SERVER if you’re running mod_php, etc) contains the raw string, and this string contains the dots.

The following “”parser”” is a work in progress and does currently not support the array syntax for keys that PHP allow, but it solves the issue for regular vars. I will try to extend this later on to do actually replicate the functionality of the regular parser.

Here’s the code. No warranties. Ugly hack. You’re warned. Leave a comment if you have any good suggestions regarding this (.. or know of an existing library doing the same..).

function http_demolish_query($queryString)
{
    $result = array();
    $segments = explode("&", $queryString);

    foreach($segments as $segment)
    {
        $parts = explode('=', $segment);

        $key = urldecode(array_shift($parts));
        $value = null;

        if ($parts)
        {
            $value = urldecode(join('=', $parts));
        }

        $result[$key] = $value;
    }

    return $result;
}

(OK, that’s not the real function name, but it’s aptly named to be the nemesis of http_build_query)

Boosting By Date in Solr 1.4

One of the things introduced with Solr 1.4 is the ms() function for getting the number of milliseconds for a timestamp since the Unix epoch. This means that you can now write date boosts without having to resort to ord() or rord().

The best solution for boosting documents based on a field on query time (to avoid having to update the boost factor based on date as time progresses) seems to be to use the boost query type. The boost query type will pass the query on to your default query handler and let that resolve the query itself, but will provide boosts for each document based on the fields queried.

An example of how to solve this issue can be found on the SolrRelevancy part of the Solr Wiki:

{!boost b=recip(ms(NOW,publishedTime),3.16e-11,1,1)}query

This will take the number of milliseconds between NOW and the time the document was published (publishedTime is one of the fields YOU have to provide when you’re indexing, this might be “created” or something else that suits your needs) and then multiply that number with 3.16e-11, which is equal to 1 / . This will make the result of the function be 1 for a document that just was published, but 0 for anything older than a year.

The Solr Wiki also contains example of how you can divide your boost query into several parts to make it easier to read.