A Simple Smarty Modifier to Generate a Chart Through Google Chart API

January 29th, 2010

After the longest title of my blog so far follows one of the shortest posts.

The function has two required parameters – the first one is provided automagically for you by smarty (it’s the value of the variable you’re applying the modifier to). This should be an array of objects containing the value you want to graph. The only required argument you have to provide to the modifier is the method to use for fetching the values for graphing.

Usage:
{$objects|googlechart:”getValue”}

This will dynamically load your plugin from the file modifier.googlechart.php in your Smarty plugins directory, or you can register the plugin manually by calling register_modifier on the template object after you’ve created it.

  1. function smarty_modifier_googlechart($points, $method, $size = "600×200", $low = 0, $high = 0)
  2. {
  3.     $pointStr = '';
  4.     $maxValue = 0;
  5.     $minValue = INT_MAX;
  6.    
  7.     foreach($points as $point)
  8.     {
  9.         if ($point->$method() > $maxValue)
  10.         {
  11.             $maxValue = $point->$method();
  12.         }
  13.  
  14.         if ($point->$method() < $minValue)
  15.         {
  16.             $minValue = $point->$method();
  17.         }
  18.     }
  19.  
  20.     if (!empty($high))
  21.     {
  22.         $maxValue = $high;
  23.     }
  24.  
  25.     $scale = 100 / $maxValue;
  26.  
  27.     foreach($points as $point)
  28.     {
  29.         $pointStr .= (int) ($point->$method() * $scale) . ',';
  30.     }
  31.  
  32.     $pointStr = substr($pointStr, 0, -1);
  33.  
  34.     // labels (5)
  35.     $labels = array();
  36.  
  37.     $steps = 4;
  38.     $interval = $maxValue / $steps;
  39.  
  40.     for($i = 0; $i < $steps; $i++)
  41.     {
  42.         $labels[] = (int) ($i * $interval);
  43.     }
  44.  
  45.     $labels[] = (int) $maxValue;
  46.  
  47.     return 'http://chart.apis.google.com/chart?cht=lc&amp;chd=t:' . $pointStr . '&amp;chs=' . $size . '&amp;chxt=y&amp;chxl=0:|' . join('|', $labels);
  48. }

The function does not support the short version of the Google Chart API Just Yet ™ as it is an simple proof of concept hack made a few months ago.

How To Dismantle An Atomic HTTP Query .. String.

January 28th, 2010

Following up on yesterday’s gripe about PHPs (old and now useless) automagic translation of dots in GET and POST parameters to underscores, today’s edition manipulates the query string in place instead of returning it as an array.

This is useful if you have a query string you want to pass on to another service, and for some reason the default behaviour in PHP will barf barf and barf. That might happen because of the dot translation issue or that some services (such as Solr) rely on a parameter name being repeatable (in PHP the second parameter value will overwrite the first).

  1. function http_dismantle_query($queryString, $remove)
  2. {
  3.     $removeKeys = array();
  4.  
  5.     if (is_array($remove))
  6.     {
  7.         foreach($remove as $removeKey)
  8.         {
  9.             $removeKeys[$removeKey] = true;
  10.         }
  11.     }
  12.     else
  13.     {
  14.         $removeKeys[$remove] = true;
  15.     }
  16.  
  17.     $resultEntries = array();
  18.     $segments = explode("&", $queryString);
  19.  
  20.     foreach($segments as $segment)
  21.     {
  22.         $parts = explode('=', $segment);
  23.  
  24.         $key = urldecode(array_shift($parts));
  25.  
  26.         if (!isset($removeKeys[$key]))
  27.         {
  28.             $resultEntries[] = $segment;
  29.         }
  30.     }
  31.  
  32.     return join('&', $resultEntries);
  33. }

I’m not really sure what I’ll call the next function in this series, but there sure are loads of candidates out there.

Getting Dots to Work in PHP and GET / POST / COOKIE Variable Names

January 27th, 2010

One of the oldest and ugliest relics of the register_globals era of PHP are the fact that all dots in request variable names gets replaced with “_”. If your variable was named “foo.bar”, PHP will serve it to you as “foo_bar”. You cannot turn this off, you cannot use extract() or parse_str() to avoid it and you’re mostly left out in the dark. Luckily the QUERY_STRING enviornment (in _SERVER if you’re running mod_php, etc) contains the raw string, and this string contains the dots.

The following “”parser”" is a work in progress and does currently not support the array syntax for keys that PHP allow, but it solves the issue for regular vars. I will try to extend this later on to do actually replicate the functionality of the regular parser.

Here’s the code. No warranties. Ugly hack. You’re warned. Leave a comment if you have any good suggestions regarding this (.. or know of an existing library doing the same..).

  1. function http_demolish_query($queryString)
  2. {
  3.     $result = array();
  4.     $segments = explode("&", $queryString);
  5.  
  6.     foreach($segments as $segment)
  7.     {
  8.         $parts = explode('=', $segment);
  9.  
  10.         $key = urldecode(array_shift($parts));
  11.         $value = null;
  12.  
  13.         if ($parts)
  14.         {
  15.             $value = urldecode(join('=', $parts));
  16.         }
  17.  
  18.         $result[$key] = $value;
  19.     }
  20.  
  21.     return $result;
  22. }

(OK, that’s not the real function name, but it’s aptly named to be the nemesis of http_build_query)

Retrieving URLs in Parallel With CURL and PHP

January 24th, 2010

As we’ve recently added support for querying Solr servers in parallel, one of the things we added was a simple class to allow us to query several servers at the same time. The CURL library (which has a PHP extension) even provides an abstraction layer for doing the nitty gritty work for you, as long as you keep track of the resources. The code beneath is based on examples in the documentation and a few tweaks of my own.

The code beneath is licensed under a MIT license. You can also download the file (gzipped).

  1. class Footo_Content_Retrieve_HTTP_CURLParallel
  2. {
  3.     /**
  4.      * Fetch a collection of URLs in parallell using cURL. The results are
  5.      * returned as an associative array, with the URLs as the key and the
  6.      * content of the URLs as the value.
  7.      *
  8.      * @param array<string> $addresses An array of URLs to fetch.
  9.      * @return array<string> The content of each URL that we've been asked to fetch.
  10.      **/
  11.     public function retrieve($addresses)
  12.     {
  13.         $multiHandle = curl_multi_init();
  14.         $handles = array();
  15.         $results = array();
  16.  
  17.         foreach($addresses as $url)
  18.         {
  19.             $handle = curl_init($url);
  20.             $handles[$url] = $handle;
  21.  
  22.             curl_setopt_array($handle, array(
  23.                 CURLOPT_HEADER => false,
  24.                 CURLOPT_RETURNTRANSFER => true,
  25.             ));
  26.  
  27.             curl_multi_add_handle($multiHandle, $handle);
  28.         }
  29.  
  30.         //execute the handles
  31.         $result = CURLM_CALL_MULTI_PERFORM;
  32.         $running = false;
  33.  
  34.         // set up and make any requests..
  35.         while ($result == CURLM_CALL_MULTI_PERFORM)
  36.         {
  37.             $result = curl_multi_exec($multiHandle, $running);
  38.         }
  39.  
  40.         // wait until data arrives on all sockets
  41.         while($running && ($result == CURLM_OK))
  42.         {
  43.             if (curl_multi_select($multiHandle) > -1)
  44.             {
  45.                 $result = CURLM_CALL_MULTI_PERFORM;
  46.  
  47.                 // while we need to process sockets
  48.                 while ($result == CURLM_CALL_MULTI_PERFORM)
  49.                 {
  50.                     $result = curl_multi_exec($multiHandle, $running);
  51.                 }
  52.             }
  53.         }
  54.  
  55.         // clean up
  56.         foreach($handles as $url => $handle)
  57.         {
  58.             $results[$url] = curl_multi_getcontent($handle);
  59.  
  60.             curl_multi_remove_handle($multiHandle, $handle);
  61.             curl_close($handle);
  62.         }
  63.  
  64.         curl_multi_close($multiHandle);
  65.  
  66.         return $results;
  67.     }
  68. }

Download the file.

Escaping Characters in a Solr Query / Solr URL

January 20th, 2010

We’re using our own Solr library at Derdubor at the moment, but we’ve only been using it for indexing content. The query part was never standardized in our common library as we usually used an alternative output format, but during the last days that has changed. We now have a parser for the default XML outputter and we’re also supporting facets and field queries (or constraints as they’re abstracted as in our library).

This means that we’re feeding content into the query that may contain foreign characters, in particular those who have special meaning in a Solr query. You can find the complete list of characters that need to be escaped in a SOLR or Lucene query in the Lucene manual.

To escape the characters we use this very simple and stupid PHP method:

  1.     static public function escapeSolrValue($string)
  2.     {
  3.         $match = array('\\', '+', '-', '&', '|', '!', '(', ')', '{', '}', '[', ']', '^', '~', '*', '?', ':', '"', ';', ' ');
  4.         $replace = array('\\\\', '\\+', '\\-', '\\&', '\\|', '\\!', '\\(', '\\)', '\\{', '\\}', '\\[', '\\]', '\\^', '\\~', '\\*', '\\?', '\\:', '\\"', '\\;', '\\ ');
  5.         $string = str_replace($match, $replace, $string);
  6.  
  7.         return $string;
  8.     }

We used a regular expression first, but the sheer amount of backslashes made it a regular .. hell … to read. So to make it easier for the persons maintaining this in the future, we went the easy to read / easy to maintain road for this one.

PHP: Fatal error: Can’t use method return value in write context

January 18th, 2010

Just a quick post to help anyone struggling with this error message, as this issue gets raised from time to time on support forums.

The reason for the error is usually that you’re attempting to use empty or isset on a function instead of a variable. While it may be obvious that this doesn’t make sense for isset(), the same cannot be said for empty(). You simply meant to check if the value returned from the function was an empty value; why shouldn’t you be able to do just that?

The reason is that empty($foo) is more or less syntactic sugar for isset($foo) && $foo. When written this way you can see that the isset() part of the statement doesn’t make sense for functions. This leaves us with simply the $foo part. The solution is to actually just drop the empty() part:

Instead of:

  1. if (empty($obj->method()))
  2. {
  3. }

Simply drop the empty construct:

  1. if ($obj->method())
  2. {
  3. }

Missed Schedule for Posts in WordPress

January 12th, 2010

As I started queuing the posts for the previous run of “Ready for 2010“-articles, I came across a problem with my WordPress installation. The scheduled articles didn’t show up when they were scheduled, and the only thing shown in the WordPress administration interface were a message about “Missed Schedule”. No shit, sherlock.

The reason behind the message is that the wp-cron.php file didn’t run as it should. WordPress usually tries to run this every now and then by inserting a reference to the file through the web site. Apparently this behavior was borked on my blog. I have a perfectly working cron implementation on my server, so instead of relying on WordPress to do some kind of magic to insert a reference to the file and kick off the processing with a web request, I added a reference to wp-cron.php in my usual crontab.

I have no idea how often wp-cron really should be run, but decided that a five minute resolution was enough for my use. The crontab entry is included here:

*/5 * * * * cd <directory of blog> && php wp-cron.php

This runs the cron script from the proper directory, and seems to work fine.

Writing a Munin Plugin

January 9th, 2010

I have to admit something. I’ve become addicted.

One of the things I finally got around to doing while living the quiet life over the christmas holiday was to dive a bit further into Munin – a simple framework for collecting information from your computers and servers and making nice graphs that you can watch while you’re bored.

I’m not going to write a lot about how you can create your own Munin plugin to create your own graphs, as they have a very simple tutorial giving you all the basics about writing Munin plugins themselves. The only thing you need to remember are these two tidbits:

  1. When Munin first registers your plugin, it runs your script with config as the only argument. This provides Munin with the name of the graph, the labels and names (keys) of the graphs you’re providing values for, information about the axis, etc.
  2. When Munin runs your script without the config argument, it expects you to give it values for the keys you provided it in the configuration.

You enable and disable plugins by creating symlinks in /etc/munin/plugins (at least under debian / ubuntu), and plugins are usually stored in /usr/share/munin/plugins.

I keep my plugins archived together with the rest of the repository for my web projects, and then either symlink the content into the plugins-directory or create a simple wrapper script that changes the current directory to the location of the script and then invokes it (to make the current working directory be correct).

A very simple bash script that does this – and passes through any parameters given to the script:

  1. #!/bin/bash
  2. cd <absolute path> && php ./<script name> "$@"

An example of a simple PHP script to provide information to Munin:

  1. <?php
  2. if ((count($argv) > 1) && ($argv[1] == 'config'))
  3. {
  4.     print("graph_title THE TITLE OF YOUR GRAPH
  5. graph_category THE CATEGORY / GROUP OF YOUR GRAPH
  6. graph_vlabel Count
  7. total.label Total
  8. other.label Other
  9. ");
  10.     exit();
  11. }
  12.  
  13. print('total.value ' . get_total_value() . "\n");
  14. print('other.value ' . get_other_value() . "\n");

Symlink everything, check that it runs properly when you execute the script from the plugins directory:

mats@xx:/usr/share/munin/plugins$ ./scriptname
total.value 37
other.value 13
mats@xx:/usr/share/munin/plugins$

Symlink it into the /etc/munin/plugins directory and reload or restart Munin.

To check that Munin runs your script properly, telnet into the Munin server from an approved host and type “fetch “. You should now see the same output as you got when you simply typed ./scriptname in the plugins directory.

If stuff doesn’t work and you’re having a hard time finding out why, be sure to check out the munin-node logfile: /var/log/munin/munin-node.log.

As soon as you have the basics down, you’re free to start graphing whatever numeric value you can think of. The most interesting uses are probably something that integrates with your web applications, such as the number of searches, the number of signed up users, the language selection of users, the popularity of certain categories, etc. The possibilities are endless, use your imagination!

And about the addiction: NEED MORE GRAPHS.

Ready for 2010: HTTP Headers and Client Side Caching

January 6th, 2010

There’s a few easy changes you can do to your website setup to speed up content delivery and eat up less bandwidth: configure proper expire values and if possible, keep your static resources on a separate domain.

The HTTP Expires Header

Expires tells the client how long it can keep the current version of a resource as the most recent one. If you set the Expires-header a while into the future, the browser will not make a new request for the file until the resource, well, expires (depending on the cache settings for the browser, requesting a reload (such as shift-reloading in a browser), etc. which can expire the resource earlier). The potential problem is the case where a resource actually changes, such as deploying a change to your stylesheet or external javascript files.

The fix for this is to include something about the file which changes when the file is physically updated on the disk. This can be the last modified time (please keep this cached in your web application, you do not want to hit the disk to retrieve the value for each page view), the current revision number from your revision control system (such as SVN – you can get the current revision of a file by using svn info, and please, cache that value to. You do not want to call svn for each page view :-)) or something else, such as the md5 or crc32 hash of the file. The important part is that you include this value as part of the request, making the URL to the resource unique depending on the version of the resource. You can safely ignore this part of the URL in your rewrite / controller routing magic / handling application, as the only function it has is to tell the browser that it has to request a new file and not use the old one anymore.

Examples of URL-schemes To Get Around Expires:-headers

  1. flickr uses as simple .v in their URLs to indicate the version of the file: http://l.yimg.com/g/css/c_sets.css.v74709.14
  2. On Gamer.no we use the current SVN revision: /css/main.css?v=1120M
  3. vg.no uses the current date, followed with an identifier that probably indicates the current revision for that day: css/frontpage.css?20091203-1

It’s important to remember that the identifier is not used to deliver an older version of the file depending on the parameter, just to make the browser see the new resource. The old URL can still serve the new resource – and if you need to keep old versions around, you’ve probably solved this issue already.

Use a Separate Domain for Static Resources

By using another, separate domain for your static resources, you’re letting browsers fetch the static resources while they’re still processing your HTML. The HTTP/1.1 specification says that browsers never should request more than two files at the same time from the same domain. When you host your static resources on another domain, you tell the browser that it can go ahead and fetch those resources while being busy with downloading other items from your main site.

After you’ve moved your static resources to a separate domain, you’ll usually also end up using less bandwidth. Since you’re now delivering the most requested content from another host, cookies will not be included in the request from the browser. When a browser makes a request for a resource on a certain host, it includes all the cookies that have been set for that domain. This happens independent of which files it’s requesting, and if you have a large number of separate files (which you probably could include into one larger file – resulting in fewer HTTP requests), these Cookie-headers can add up to a significant amount of bandwidth. The HTTP server will also have less work to do, making everyone happier!

If you use www. as a prefix for all your regular HTTP requests and take care of setting your cookies in the www.example.com domain, you should be able to simply use something like static.example.com for your static content and avoid leaking cookies into the other subdomain. If you have loads of static content, you can also use several separate subdomains for your files, but be sure to let the request for a certain file point to the same subdomain each time – otherwise you’ll end up with the browser requesting four copies of the same, identical file and actually breaking the regular cache in the browser (which uses If-Modified-Since to tell the server when it last downloaded the file. We want to avoid the browser making the request again at all). At pwned.no I calculate the crc32 of the filename and use that value to determine which static host the request should use. We also redirect any requests directly to pwned.no to www.pwned.no to make the cookie structure consistent. We do however not set the Expires-header yet, but that might be a part of the next update to the site.

Do you have a particular caching strategy you use for client side content? What kind of URL format works best for you? Leave a comment!

Read all the articles in the Ready for 2010-series

Ready for 2010: Check Your Indexes

January 5th, 2010

One of the many things you should try to keep a continuous watch for during the life of any of your applications are the performance of your SQL queries. You might be caching the hell out of your database layer, but some time you’ll have to hit the database server to retrieve data. And if that starts to happen often enough while you’re growing, you will see your SQL daemon taking up the largest part of your disk io and your CPU time. This might not be a problem for the load you’re seeing now, but could you handle a 10 fold increase in traffic? .. or how about 100x? (which, if I remember correctly, is what Google uses as the scale factor when developing applications)

Indexes Are Your Friend

During the christmas holiday I got around to taking a look at some of the queries running at one of my longest living, most active sites: pwned.no. Pwned is a tournament engine running on top of PHP and MySQL, containing about 40.000 tournaments, 450.000 matches and several other database structures. The site has performed well over the years and there hasn’t been any performance issues other than a few attempts at DoS-ing the site with TCP open requests (the last one during the holiday, actually).

Two weeks ago the server suddenly showed loads well above 30 – while it usually hovers around 0.3 – 0.4 at the busiest hours of the day. The reason? One of the previously less used functions of the tournament engine, using a group stage in your tournament, had suddenly become popular in at least one high traffic tournament. This part of the code had never been used much before, but when the traffic spike happened everything went bananas (B-A-N-A-N-A-S. Now that’s stuck in your head. No problem.) The reason: the query used a couple of columns in a WHERE-statement that wasn’t indexed, and the query ran against the table containing the matches for the tournament. This meant that over 400.000 rows were scanned each time the query ran, meaning that mysqld started hogging every resource it could. The Apache childs then had to wait, making the load a bit too much for my liking. Two CREATE INDEX-calls later the load went back down and everything chugged along nicely again.

My strategy for discovering queries that might need a better index scheme (or if “impossible”, a proper caching layer in front of it):

  1. Run your development server with slow-query-log=1, log-queries-not-using-indexes=1 and long-query-time=<an appropriately low value, such as 0.05 – depends on your setup>. You can also provide a log file name with the log-slow-queries=/var/log/mysql/… in your my.cnf-file for MySQL. This will log all potential queries for optimizing to the log file (this will not necessarily provide you with a complete list of good queries to optimize, but it might provide a few good hints). Be sure to use actual data from your site when working on your development version, as you might start seeing issues when the size of the data set reaches a certain size – such as 400.000 rows in the example mentioned above)
  2. Connect to your MySQL server and issue
    SHOW PROCESSLIST

    and

    SHOW FULL PROCESSLIST

    statements every now and then. This will let you see any queries that run often and way too long (but they’ll have to run when you issue the command). You might not catch the real culprit, but if you’re seing MySQL chugging along with 100% CPU and are wondering what’s happening, try to check out what the threads are doing. You’ll hopefully see just which query is wreaking havoc with your server.

  3. Add a statistics layer in front of your MySQL calls in your application. If you’re using PDO you can subclass it to keep a bit of statistics about your queries around. The number of times each query is run, the time it took in total running the query and other interesting values. We’re using a version of this in the development version of Gamer.no and I’ll probably upload the class to my github repository as soon as I get a bit of free time in the new year.

Not sure what I’ll take a closer look at tomorrow, but hopefully I’ll decide before everything collapses!

What are your strategy for indexes? What methods do you use for finding queries that need a bit more love? Leave a comment below!

Read all the articles in the Ready for 2010-series