Mounting / Serving a Pyramid Application From Several Paths / URLs

One of the requirements for an application we’re developing is that the same application should be served from different endpoints. The application currently lives at / at one of our servers, but should also be available at /foo/bar – which is served through a 3rd party varnish with our server as the backend, pulling the application into a different domain name space.

mod_wsgi in Apache supports this, but for those of us who are not using the Apache version, you can also handle it directly in the Pyramid configuration by using the urlmap handler, available in the Paste library.

Our old configuration:

[app:main]
use = egg:applicationname

First, we’ll have to require paste in setup.py, so add:

'paste',

to the requires = [ ... ] list in your setup.py (and rerun setup.py develop).

Second, use urlmap as your main application in your configuration (development.ini, production.ini). urlmap will then route the request internally to the required application (which also means that the endpoints may point to different applications, but we’re not using that yet):

[composite:main]
use = egg:Paste#urlmap
/ = applicationname
/foo/bar = applicationname
/spam/ham = applicationname

[app:applicationname]
use = egg:applicationame

The composite app is now the main entry point for the WSGI application, which then hands the request off to the actual application, applicationname.

php-amqplib: Uncaught exception ‘Exception’ with message ‘Error reading data. Recevived 0 instead of expected 1 bytes’

I’ve been playing around with RabbitMQ recently, but trying to find out what caused the above error included a trip through wireshark and an attempt to dig through the source code of php-amqplib. It seems that it’s (usually) caused by a permission problem: either the wrong username / password combination as reported by some on the wide internet, or by my own issue: the authenticated user didn’t have access to the vhost I tried to associate my connection with.

You can see the active permissions for a vhost path by using rabbitmqctl:

sudo rabbitmqctl list_permissions -p /vhostname

.. or you if you’ve installed the web management plugin for rabbitmq: select Virtual Hosts in the menu, then select the vhost you want to see permissions for.

You can give a user (all out) access to the vhost by using rabbitmqctl:

sudo rabbitmqctl set_permissions -p /vhostname guest ".*" ".*" ".*"

.. or by adding the permissions through the web management interface, where you can select the user and the permission regexes for the user/vhost combination.

A Simple Smarty Modifier to Generate a Chart Through Google Chart API

After the longest title of my blog so far follows one of the shortest posts.

The function has two required parameters – the first one is provided automagically for you by smarty (it’s the value of the variable you’re applying the modifier to). This should be an array of objects containing the value you want to graph. The only required argument you have to provide to the modifier is the method to use for fetching the values for graphing.

Usage:
{$objects|googlechart:”getValue”}

This will dynamically load your plugin from the file modifier.googlechart.php in your Smarty plugins directory, or you can register the plugin manually by calling register_modifier on the template object after you’ve created it.

function smarty_modifier_googlechart($points, $method, $size = "600x200", $low = 0, $high = 0)
{
    $pointStr = '';
    $maxValue = 0;
    $minValue = INT_MAX;
    
    foreach($points as $point)
    {
        if ($point->$method() > $maxValue)
        {
            $maxValue = $point->$method();
        }

        if ($point->$method() < $minValue)
        {
            $minValue = $point->$method();
        }
    }

    if (!empty($high))
    {
        $maxValue = $high;
    }

    $scale = 100 / $maxValue;

    foreach($points as $point)
    {
        $pointStr .= (int) ($point->$method() * $scale) . ',';
    }

    $pointStr = substr($pointStr, 0, -1);

    // labels (5)
    $labels = array();

    $steps = 4;
    $interval = $maxValue / $steps;

    for($i = 0; $i < $steps; $i++)
    {
        $labels[] = (int) ($i * $interval);
    }

    $labels[] = (int) $maxValue;

    return 'http://chart.apis.google.com/chart?cht=lc&chd=t:' . $pointStr . '&chs=' . $size . '&chxt=y&chxl=0:|' . join('|', $labels);
}

The function does not support the short version of the Google Chart API Just Yet (tm) as it is an simple proof of concept hack made a few months ago.

Getting Dots to Work in PHP and GET / POST / COOKIE Variable Names

One of the oldest and ugliest relics of the register_globals era of PHP are the fact that all dots in request variable names gets replaced with “_”. If your variable was named “foo.bar”, PHP will serve it to you as “foo_bar”. You cannot turn this off, you cannot use extract() or parse_str() to avoid it and you’re mostly left out in the dark. Luckily the QUERY_STRING enviornment (in _SERVER if you’re running mod_php, etc) contains the raw string, and this string contains the dots.

The following “”parser”” is a work in progress and does currently not support the array syntax for keys that PHP allow, but it solves the issue for regular vars. I will try to extend this later on to do actually replicate the functionality of the regular parser.

Here’s the code. No warranties. Ugly hack. You’re warned. Leave a comment if you have any good suggestions regarding this (.. or know of an existing library doing the same..).

function http_demolish_query($queryString)
{
    $result = array();
    $segments = explode("&", $queryString);

    foreach($segments as $segment)
    {
        $parts = explode('=', $segment);

        $key = urldecode(array_shift($parts));
        $value = null;

        if ($parts)
        {
            $value = urldecode(join('=', $parts));
        }

        $result[$key] = $value;
    }

    return $result;
}

(OK, that’s not the real function name, but it’s aptly named to be the nemesis of http_build_query)

What Happened To My Beautiful En-dashes?!

First, a small introduction to the problem: We’re running stuff in UTF-8 all the way. A few sites we’re reading feeds from are using ISO-8859-1 as their charset, but they either supply the feed with the correct encoding specific or the feeds arrive as UTF-8. Everything works nicely, except for the mentioned-in-the-headline en-dashes. Firefox only shows 00 96 (0x00 0x96), but everything looks correct when you view the headlines and similiar stuff on the original site.

Strange.

The digging, oh all the digging.

After the already mentioned digging (yes, the digging) in data at the large search engines (ok, maybe I did a search or two), I discovered that the windows cp1252 encoding uses 0x96 to store endashes. This seems similiar! We’re seeing 0x96 as one of the byte values above, so apparently cp1252 is sneaking into the mix somewhere along the lines. Most of the clients using the CMS-es are windows, so they might apparently be to blame.

ISO-8859-1 enters the scene

As the sites (and feeds) provide ISO-8859-1 as their encoding, I thought it would be interesting to see what ISO-8859-1 defines as the representation for the byte value 0x96. Lo’ and behold: 0x96 is not defined in ISO-8859-1. Which actually provides us with the solution.

I welcome thee, Mr. Solution

When the ISO-8859-1 encoded string is converted into UTF-8, the bytes with the value 0x96 (which is the endash in cp1252) is simply inserted as a valid code sequence in UTF-8 which represents a character that’s not defined.

We’re saying that the string is ISO-8859-1, although in reality it is either cp1252 or a mangled version of iso-8859-1 and cp1252 (for the endashes, at least).

If you’re on the parsing end of this mumbo jumbo, one solution is to replace the generated UTF-8 sequence (0xc2 0x96) (converted from 0x96 i ISO-8859-1) with the proper one (0xe2 0x80 0x93):

$data = str_replace("\xc2\x96", "\xE2\x80\x93", $data);

And voilá, everything works.

Finding Substring in a String in Bash

If you’re ever in the need of checking if a variable in bash contains a certain string (and is not equal to, just a part of), the =~ operator to the test function comes in very handy. =~ is usually used to compare a variable against a regular expression, but can also be used for substrings (you may also use == *str*, but I prefer this way).

This short example submits a document to solr using curl, then emails the result if the Solr server responded with an error (.. I tried mapping this against the error code or something similiar instead, but didn’t find a better way. If you have something better, please leave a comment!):

    CURLRESULT=`cat $i | curl -s -X POST -H 'Content-Type: text/xml' -d @- $URL`
    if [[ $CURLRESULT =~ "Error report" ]]
      then 
	echo "Error!! Error!! CRISIS IMMINENT!!"
        echo $CURLRESULT | mail -s "Error importing to SOLR" mail@example.com
        exit
    fi

Neat to check that everything went OK before you remove the files you’ve submitted.

parser error : Detected an entity reference loop

While importing a rather large XML-document (45MB+) into a database today, I ran into a weird problem on one specific server. The server runs SUSE Enterprise and presented an error that neither other test server gave. After a bit of digging around on the web I were able to collect enough information from several different threads about what could be the source of the problem.

It seems that the limit was introduced in libxml2 about half a year ago to avoid some other memory problems, but this apparently borked quite a few legitimate uses. As I have very little experience with administrating SUSE Enterprise based servers, I quickly shrugged off trying to update the packages and possibly recompiling PHP. Luckily one of the comments in a thread about the problem saved the day.

If you find yourself running into this message; swap your named entities in the XML file (such as &lt; and &gt;) to numeric entities (such as &#60; and &#62;). This way libxml2 just replaces everything with the byte value while parsing instead of trying to be smart and keep an updated cache.

A Quick INSERT INTO Trick

From time to time you’re going to need to move some data from one table into another, in particular to generalize or specialize code. I’ve seen amazingly large code blocks written to handle simple cases as this, and here’s a simple trick you’re going to need some day:

INSERT INTO 
    
    (field_1, field_2) 
SELECT
    some_field, some_other_field 
FROM 
    

You can also add conditionals, i.e. to retrieve rows that might have been inserted yesterday, etc:

INSERT INTO 
    
(field_1, field_2) SELECT some_field, some_other_field FROM WHERE condition = 1

The good thing about this is that all data movement is kept within the database server, so that the data doesn’t have to travel from the server, to the client and then back from the client to the server again. It’s blazingly fast compared to other methods. You can also do transformations with regular SQL functions in your SELECT statement, which should help you do very simple operations at the speed of light.

The Side Effect of Using Return Type To Handle Errors

I came across a curious little side effect of the issue I just posted earlier:

public function action()
{
   $id = filter_input(INPUT_POST, 'id', FILTER_SANITIZE_STRING);
	
   if(is_bool($id))
   {
      die('invalid id');
   }

   if(isset($_POST['accept']))
   {
      $this->acceptId($id);
   }
   else if(isset($_POST['reject']))
   {
      $this->rejectId($id);
   }
}

This is obviously going to do something when a POST has occured, but the method was invoked each and every time. The reason why it works? If there is no variable named ‘id’ POSTed, filter_input returns null. And null is not boolean, so the test passes. But as it’s not a POST request, neither of the two other if-tests are true, so the code silently passes through (id can contain characters here, so the filter isn’t used to just get integers).

If you’re not going to throw an error when the parameter is missing, the test is actually completely useless. This bug had hidden itself within the usage of the type to test for fault instead of actually checking the value, and did not creep out before I rewrote the if-test.

Releasing Keepitclose Alpha

I’ve just created a small project site over at Google Code for Keepitclose. Keepitclose is a HTTP local caching web service written in PHP meant to cache resources retrieved from external and internal web services. It currently supports both an APC backend (using shared memory in the web server itself) and using Memcache (which allows you to have several frontends using one or several backend memcache servers).

Missing features are currently:

  • Keeping HTTP headers
  • Adding cache information to HTTP headers
  • A file based backend
  • Everything else you can think of

The server do however support:

  • Time to live (TTL) for a cached resource when stored
  • TTL Window, i.e.:

    A new version should be fetched at regular intervals, such as every sixth hour. New data are available at 00:00, 06:00, 12:00 and 18:00. Use ttlWindowSize=21600 to get six hours window size. Use ttlWindowOffset to add a offset; ttlWindowOffset=900 adds fifteen minutes to the value and retrives new content at 00:15, 06:15, 12:15 and 18:15.

If you want to implement your own backend, implement the Keepitclose_Storage_Interface and add an entry into the config.php file for your module, i.e. ‘file’ => array(‘path’ => ‘/tmp’).

The current version also supports simple access control using the allowedHosts entry in the config file. This entry contains one row for each allowed host as a regular expression:

    'allowedHosts' => array(
        '^127\\.0\\.0\\.1$',
    ),

.. will only allow requests from localhost. To allow a subnet, you could write ‘^127\\.0\\.0\\.[0-9]+$’.

You can also add several memcache servers, a request will then poll a random server to retrieve the resource. If not found, the content will be retrieved from the web and stored to all memcache servers. This is useful for environments with a very heavy read load.

'storage' => array(
    'memcache' => array(
        array(
            'host' => 'tcp://127.0.0.1',
        ),
    ),
),

To add more memcache servers, simply add another array() of memcache entries:

'memcache' => array(
    array(
        'host' => 'tcp://127.0.0.1',
    ),
    array(
        'host' => 'tcp://127.0.0.2',
    ),
),

You can also set ‘port’ and ‘persistent’ for the memcache connections.

Please leave a comment here or create an issue ticket on the google code page if you need any help or have any suggestions. All patches are welcome.