January 11th, 2010
We’re currently expanding our munin reporting cluster at Derdubor, but after installing munin-node on one of our servers we never got any graphs. The only section available on the munin server was “Other”, and that didn’t contain any information at all (which indicates that you’re not getting any response from the server).
The first step I make when trying to debug a munin connection is to telnet into the munin port, as this confirms that the two servers are able to talk to each other and that the munin daemon listens to the correct interface and port.
# telnet localhost 4949
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.
#
The connection was established, but then munin closed the connection as soon as it was created. This usually means one thing: the host you’re connecting from isn’t added to the cidr_allow list or the allow list, or in the denied hosts list. This time it meant neither, the host was added and we didn’t have any denied hosts list.
The next step was to take a look at the munin-node.log in /var/log/munin (at least under under debian).
The last message was:
User "ejabberd" in configuration file "/etc/munin/plugin-conf.d/munin-node" nonexistant. Skipping plugin. at /usr/sbin/munin-node line 615, line 83.
Something wicked happened while reading "/etc/munin/plugins/munin-node". Check the previous log lines for spesifics. at /usr/sbin/munin-node line 261, line 83.
We don’t have ejabberd installed, but the ejabberd config reference was apparently added to the configuration file in /etc/munin/plugin-conf.d/munin-node. This made our version of munin-node barf, as the user it reference wasn’t available.
Next step was to remove the section from the file and restarting munin-node:
/etc/init.d/munin-node restart
After restarting munin, I did the telnet check again:
# telnet localhost 4949
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
# munin node at example.com
.
fetch load
load.value 0.02
.
quit
Connection closed by foreign host.
#
Wait 10 – 15 minutes and you should start seeing graphs again – if this actually were your problem. Probably not (and then you should probably read Debuggning Munin Plugins and other documentation on the Wiki). But if it were, you’ll be happy happy joy joy now.
Tags: administration, ejabberd, munin, munin-node, server administration
Posted in Programming, Scalability | No Comments »
January 9th, 2010
I have to admit something. I’ve become addicted.
One of the things I finally got around to doing while living the quiet life over the christmas holiday was to dive a bit further into Munin – a simple framework for collecting information from your computers and servers and making nice graphs that you can watch while you’re bored.
I’m not going to write a lot about how you can create your own Munin plugin to create your own graphs, as they have a very simple tutorial giving you all the basics about writing Munin plugins themselves. The only thing you need to remember are these two tidbits:
- When Munin first registers your plugin, it runs your script with config as the only argument. This provides Munin with the name of the graph, the labels and names (keys) of the graphs you’re providing values for, information about the axis, etc.
- When Munin runs your script without the config argument, it expects you to give it values for the keys you provided it in the configuration.
You enable and disable plugins by creating symlinks in /etc/munin/plugins (at least under debian / ubuntu), and plugins are usually stored in /usr/share/munin/plugins.
I keep my plugins archived together with the rest of the repository for my web projects, and then either symlink the content into the plugins-directory or create a simple wrapper script that changes the current directory to the location of the script and then invokes it (to make the current working directory be correct).
A very simple bash script that does this – and passes through any parameters given to the script:
-
#!/bin/bash
-
cd <absolute path> && php ./<script name> "$@"
An example of a simple PHP script to provide information to Munin:
-
<?php
-
if ((count($argv) > 1) && ($argv[1] == 'config'))
-
{
-
print("graph_title THE TITLE OF YOUR GRAPH
-
graph_category THE CATEGORY / GROUP OF YOUR GRAPH
-
graph_vlabel Count
-
total.label Total
-
other.label Other
-
");
-
exit();
-
}
-
-
print('total.value ' . get_total_value() . "\n");
-
print('other.value ' . get_other_value() . "\n");
Symlink everything, check that it runs properly when you execute the script from the plugins directory:
mats@xx:/usr/share/munin/plugins$ ./scriptname
total.value 37
other.value 13
mats@xx:/usr/share/munin/plugins$
Symlink it into the /etc/munin/plugins directory and reload or restart Munin.
To check that Munin runs your script properly, telnet into the Munin server from an approved host and type “fetch
“. You should now see the same output as you got when you simply typed ./scriptname in the plugins directory.
If stuff doesn’t work and you’re having a hard time finding out why, be sure to check out the munin-node logfile: /var/log/munin/munin-node.log.
As soon as you have the basics down, you’re free to start graphing whatever numeric value you can think of. The most interesting uses are probably something that integrates with your web applications, such as the number of searches, the number of signed up users, the language selection of users, the popularity of certain categories, etc. The possibilities are endless, use your imagination!
And about the addiction: NEED MORE GRAPHS.
Tags: bash, creating, graphs, munin, PHP, plugins, rrdtool
Posted in Hacks, PHP, Programming, Scalability, pwned.no | 1 Comment »
January 8th, 2010
At some time during friday one of my web servers started to behave rather strangely. When attempting to connect to the web site, the requests would time out almost randomly. About half of them got through, while the other half seemed to time out or being left for dead. Restarting the web server helped, but the problem crept back in within 10 – 15 seconds. This seemed very strange, but digging through the logs of the server and checking the load of the database server didn’t show any apparent problems.
After heading over to check the syslog (/var/log/syslog) I found that the TCP stack was trying to tell me something:
TCP: drop open request from u.x.y.z/vvvv
printk: 228 messages suppressed.
Apparently this is one of the signs of an attempted (D)DoS-attack, when a computer on the other end sends as many TCP open requests as possible to a port on the computer, making the daemon busy with just handling idling connections that never go anywhere.
I realized that this fit the pattern I was seeing quite good: the web server accepted requests as normal after restarting it, before being hit with loads of bogus open requests right after. The requests were never proper HTTP requests, resulting in them not being logged to the normal error or access logs.
There are at least two ways of handling this on the server itself (there’s probably a couple of hundreds more, but the first one worked for me). Simply drop the traffic – or turn on TCP SYN Cookies.
If the attack is from a particular host or subnet, dropping the traffic works fine:
iptables -I INPUT -s u.x.y.z -j DROP
If the attack originates at several different locations, turning on TCP SYN Cookies while the attack is taking place is probably the best idea (as enabling TCP SYN Cookies will disable most high performance TCP options, you’ll want to disable it after the attack has subsided again).
You enable TCP SYN Cookies with:
echo 1 >> /proc/sys/net/ipv4/tcp_syncookies
You can read a bit more about how the tcp_syncookies setting works at securityfocus.
If you’re seeing these problems often I strongly recommend you talk with your hosting provider and ISP to get the problem fixed by Someone Who Knows What They’re Doing. Getting rid of the troublesome requests before they even arrive at your server is also a good idea.
Tags: Apache, attack, ddos, dos, perodic problems, tcp_syncookies, timing out, works again
Posted in Scalability | No Comments »
January 6th, 2010
There’s a few easy changes you can do to your website setup to speed up content delivery and eat up less bandwidth: configure proper expire values and if possible, keep your static resources on a separate domain.
The HTTP Expires Header
Expires tells the client how long it can keep the current version of a resource as the most recent one. If you set the Expires-header a while into the future, the browser will not make a new request for the file until the resource, well, expires (depending on the cache settings for the browser, requesting a reload (such as shift-reloading in a browser), etc. which can expire the resource earlier). The potential problem is the case where a resource actually changes, such as deploying a change to your stylesheet or external javascript files.
The fix for this is to include something about the file which changes when the file is physically updated on the disk. This can be the last modified time (please keep this cached in your web application, you do not want to hit the disk to retrieve the value for each page view), the current revision number from your revision control system (such as SVN – you can get the current revision of a file by using svn info, and please, cache that value to. You do not want to call svn for each page view :-)) or something else, such as the md5 or crc32 hash of the file. The important part is that you include this value as part of the request, making the URL to the resource unique depending on the version of the resource. You can safely ignore this part of the URL in your rewrite / controller routing magic / handling application, as the only function it has is to tell the browser that it has to request a new file and not use the old one anymore.
Examples of URL-schemes To Get Around Expires:-headers
- flickr uses as simple .v in their URLs to indicate the version of the file: http://l.yimg.com/g/css/c_sets.css.v74709.14
- On Gamer.no we use the current SVN revision: /css/main.css?v=1120M
- vg.no uses the current date, followed with an identifier that probably indicates the current revision for that day: css/frontpage.css?20091203-1
It’s important to remember that the identifier is not used to deliver an older version of the file depending on the parameter, just to make the browser see the new resource. The old URL can still serve the new resource – and if you need to keep old versions around, you’ve probably solved this issue already.
Use a Separate Domain for Static Resources
By using another, separate domain for your static resources, you’re letting browsers fetch the static resources while they’re still processing your HTML. The HTTP/1.1 specification says that browsers never should request more than two files at the same time from the same domain. When you host your static resources on another domain, you tell the browser that it can go ahead and fetch those resources while being busy with downloading other items from your main site.
After you’ve moved your static resources to a separate domain, you’ll usually also end up using less bandwidth. Since you’re now delivering the most requested content from another host, cookies will not be included in the request from the browser. When a browser makes a request for a resource on a certain host, it includes all the cookies that have been set for that domain. This happens independent of which files it’s requesting, and if you have a large number of separate files (which you probably could include into one larger file – resulting in fewer HTTP requests), these Cookie-headers can add up to a significant amount of bandwidth. The HTTP server will also have less work to do, making everyone happier!
If you use www. as a prefix for all your regular HTTP requests and take care of setting your cookies in the www.example.com domain, you should be able to simply use something like static.example.com for your static content and avoid leaking cookies into the other subdomain. If you have loads of static content, you can also use several separate subdomains for your files, but be sure to let the request for a certain file point to the same subdomain each time – otherwise you’ll end up with the browser requesting four copies of the same, identical file and actually breaking the regular cache in the browser (which uses If-Modified-Since to tell the server when it last downloaded the file. We want to avoid the browser making the request again at all). At pwned.no I calculate the crc32 of the filename and use that value to determine which static host the request should use. We also redirect any requests directly to pwned.no to www.pwned.no to make the cookie structure consistent. We do however not set the Expires-header yet, but that might be a part of the next update to the site.
Do you have a particular caching strategy you use for client side content? What kind of URL format works best for you? Leave a comment!
Read all the articles in the Ready for 2010-series
Tags: client side caching, digital chores, expires, headers, http, pwned, Ready for 2010
Posted in Articles, PHP, Programming, Ready for 2010, Scalability, html | No Comments »
October 26th, 2009
If you’re doing several encodes of a single input file (to encode several different sizes / bitrate combinations) in parallel with x264, you’re going to have a problem. The first pass will create three files with information about the file for the second pass, and you’re unable to change this file name into something better. This seems to be a problem for quite a lot of people according to a Google-search for the issue, and none seems to have any proper solution.
I have one. Well, probably not a proper solution, but at least it works! The trick is to realize that ffmpeg/x264 creates these files in the current working directory. To run several encodings in parallel, you’ll simply have to give each encoding process it’s own directory, and then use absolute paths to the source and destination file (and any other paths). Let it create the files there and clean up and delete the directories afterwards.
I’ve included some example code from PHP in regards to how you could solve something like this. I simply use the output file name as the directory name here, and create the directory in the system temp directory.
-
$tempDir = sys_get_temp_dir() . '/' . $outputFilename);
-
mkdir($tempDir, 0700, true);
-
chdir($tempDir);
After doing the encode, we’ll have to clean up. The three files that ffmpeg/x264 creates are ffmpeg2pass-0.log, x264_2pass.log and x264_2pass.log.mbtree.
-
unlink($tempDir . '/ffmpeg2pass-0.log');
-
unlink($tempDir . '/x264_2pass.log');
-
unlink($tempDir . '/x264_2pass.log.mbtree');
-
rmdir($tempDir);
And that should hopefully solve it!
Tags: 2pass, encoding, ffmpeg, PHP, x264
Posted in PHP, Scalability | 1 Comment »
July 4th, 2009
I spent the evening yesterday playing around a bit more with Gearman, a system for farming out tasks to workers across several servers. As my workstation at home still runs Windows, the only PHP library available is the Net_Gearman in PEAR. Net_Gearman supports tasks (something to do), sets (a collection of tasks), workers (the processes that performs the task) and clients (which requests tasks to be performed). The gearman protocol supports retrieving the current status of a task from the gearman server (which contains information about how the worker is progressing, reported by the worker itself), but Net_Gearman did not.
The reason for ‘did not’ is that I’ve created a small patchset to add the functionality to Net_Gearman. All internal methods and properties are still used as they were before, but I’ve added two helper methods for retrieving the socket connection for a particular gearman server (Net_Gearman usually just picks a random server, but we need to contact the server that’s responsible for the task) and a getStatus(server, handle) method to the Gearman Client. I’ve also added a property keeping the address of the server which were assigned the task to the Task class.
After submitting a task to be performed in the background (you do not need this to get the status for foreground tasks, as you can provide a callback to handle that), your Task object will have its handle and server properties set. These can be used to retrieve status information about the task later. You’ll still need to provide the possible servers to the Gearman client when creating the client (through the constructor).
Example of creating a task and retrieving the server / handle pair after starting the task:
-
require_once 'Net/Gearman/Client.php';
-
-
$client = new Net_Gearman_Client(array('host:4730'));
-
-
$task = new Net_Gearman_Task('Reverse', range(1,5));
-
$task->type = Net_Gearman_Task::JOB_BACKGROUND;
-
-
$set = new Net_Gearman_Set();
-
$set->addTask($task);
-
-
$client->runSet($set);
-
-
print("Status information: \n");
-
print($task->handle . "\n");
-
print($task->server . "\n");
Retrieving the status:
-
require_once 'Net/Gearman/Client.php';
-
-
$client = new Net_Gearman_Client(array('host:4730'));
-
$status = $client->getStatus('host:4730', 'H:mats-ubuntu:1');
The array returned from the getStatus() method is the same array as returned from the gearman server and contains information about the current status (numerator, denominator, finished, etc, var_dump it to get the current structure). I’ve also added the patchset to the Issue tracker for Net_Gearman at github.
The patchset (created from the current master branch at github) can be downloaded here: GearmanGetStatusSupport.tar.gz.
UPDATE: I’ve finally gotten around to creating my own fork of NET_Gearman on github too. This fork features the patch mentioned above.
Tags: Gearman, gearmand, net_german, patch, patchset, pear
Posted in Gearman, PHP, Programming, Scalability | 2 Comments »
May 29th, 2009
After a while you realize that the best way to serve almost-never-changing content is to give the content an expire date way ahead in the future. The allows your server and your network pipes to do more sensible stuff than delivering the same old versions of files again and again and again and again.
A problem does however surface when you want to update the files and make the visiting user request the new version instead of the old. The trick here is to change the URL for the resource, so that the browser requests the new file. You can do this by appending a version number to the file and either rewriting it behind the scenes to the original file, or by appending a timestamp (or some other item) to the URL as a GET value. The web server ignores this for regular files, but as it identifies a new unique resource, the web browser has to request it again and use the new and improved ™ file.
Using the timestamp of the file is a bit cumbersome and requires you to hit the disk one additional time each time you’re going to show an URL to one of the almost-static resources, but luckily we already have an identifier describing which version the file is in: the SVN revision number (.. if you use subversion, that is). You could use the SVN revision for each file by itself, but we usually decide that the global version number for SVN is good enough. This means that each time you update the live code base through svn up or something like that (remember to block .svn directories and their files if you run your production directory from a SVN branch. This can be discussed over and over, but I’m growing more and more fond of actually doing just that..). To avoid having to call svnversion each time, it’s useful to be able to insert the current revision number into the configuration file for the application (or a header file / bootstrap file).
Here’s an example of how you can insert the current SVN revision into a config file for a PHP application.
- Create a backup of the current configuration file.
- Update the current revision through svn up.
- Retrieve the current revision number from svnversion.
- Insert the revision number using sed into a temporary copy of the configuration file.
- Move the new configuration file into place as the current configuration file.
- Party like it’s 1999!
This assumes that you use an array named $config in your configuration file. I suggest that you name it something else, but for simplicity I’m going with that here. First, create a $config['svn'] entry in your config file. If you have some other naming scheme, you’re going to have to change the relevant parts below.
-
#!/bin/bash
-
cp ./config/config.php ./config/config.backup.php
-
svn up
-
VERSION=`svnversion .`
-
echo $VERSION
-
sed "s/config\['svn'\] = '[0-9M]*';/config\['svn'\] = '$VERSION';/" < ./config/config.php > ./config/config.fixed.php
-
mv ./config/config.fixed.php ./config/config.php
Save this into a file named upgrade.sh, make it executable by doing chmod u+x upgrade.sh and run it by typing ./upgrade.sh.
And this is where you put your hands above your head and wave them about. When you’re done with that, you can refer to your current SVN revision using $config['svn'] in your PHP application (preferrably in your template or where you build the URLs to your static resources). Simply append ?v=$config['svn'] to your current filenames. When you have a new version available, run ./upgrade.sh (or whatever name you gave the script) again and let your users enjoy the new experience.
Tags: configuration, PHP, revision, sed, subversion, svn, version number
Posted in Hacks, PHP, Programming, Scalability, javascript | 1 Comment »
May 30th, 2008
InfoQ has an article about scaling ebay online written by Randy Shoup about some of the lessons they’ve learned over at eBay during the years when it comes to scaling one of the largest internet sites in the world. The article is a very interesting read, and I’ll sum up the seven main points:
- Partition by Function
- Split Horizontally
- Avoid Distributed Transactions
- Decouple Functions Asynchronously
- Move Processing To Asynchronous Flows
- Virtualize At All Levels
- Cache Appropriately
Tags: ebay, partitioning, Scalability, soa
Posted in Scalability | No Comments »