Avoid Escaping Spaces in the Query String in a Solr Query

Following up on the previous post about escaping values in a Solr query string, it’s important to note that you should not escape spaces in the query itself. The reason for this is that if you escape spaces in the query “foo bar”, the search will be performed on the term “foo bar” itself, and not with “foo” as one term and “bar” as the other. This will only return documents that has the string “foo bar” in sequence.

The solution is to either remove the space from the escape list in the previous function – and use another function for escaping values where you actually should escape the spaces – or break up the string into “escapable” parts.

The code included beneath performs the last task; it splits the string into different parts delimited by space and then escapes each part of the query by itself.

$queryParts = explode(' ', $this->getQuery());
$queryEscaped = array();

foreach($queryParts as $queryPart)
{
    $queryEscaped[] = self::escapeSolrValue($queryPart);
}

$queryEscaped = join(' ', $queryEscaped);

How To Dismantle An Atomic HTTP Query .. String.

Following up on yesterday’s gripe about PHPs (old and now useless) automagic translation of dots in GET and POST parameters to underscores, today’s edition manipulates the query string in place instead of returning it as an array.

This is useful if you have a query string you want to pass on to another service, and for some reason the default behaviour in PHP will barf barf and barf. That might happen because of the dot translation issue or that some services (such as Solr) rely on a parameter name being repeatable (in PHP the second parameter value will overwrite the first).

function http_dismantle_query($queryString, $remove)
{
    $removeKeys = array();

    if (is_array($remove))
    {
        foreach($remove as $removeKey)
        {
            $removeKeys[$removeKey] = true;
        }
    }
    else
    {
        $removeKeys[$remove] = true;
    }

    $resultEntries = array();
    $segments = explode("&", $queryString);

    foreach($segments as $segment)
    {
        $parts = explode('=', $segment);

        $key = urldecode(array_shift($parts));

        if (!isset($removeKeys[$key]))
        {
            $resultEntries[] = $segment;
        }
    }

    return join('&', $resultEntries);
}

I’m not really sure what I’ll call the next function in this series, but there sure are loads of candidates out there.