Ready for 2010 – Mats Lindh

Ready for 2010: HTTP Headers and Client Side Caching

There’s a few easy changes you can do to your website setup to speed up content delivery and eat up less bandwidth: configure proper expire values and if possible, keep your static resources on a separate domain.

The HTTP Expires Header

Expires tells the client how long it can keep the current version of a resource as the most recent one. If you set the Expires-header a while into the future, the browser will not make a new request for the file until the resource, well, expires (depending on the cache settings for the browser, requesting a reload (such as shift-reloading in a browser), etc. which can expire the resource earlier). The potential problem is the case where a resource actually changes, such as deploying a change to your stylesheet or external javascript files.

The fix for this is to include something about the file which changes when the file is physically updated on the disk. This can be the last modified time (please keep this cached in your web application, you do not want to hit the disk to retrieve the value for each page view), the current revision number from your revision control system (such as SVN – you can get the current revision of a file by using svn info, and please, cache that value to. You do not want to call svn for each page view :-)) or something else, such as the md5 or crc32 hash of the file. The important part is that you include this value as part of the request, making the URL to the resource unique depending on the version of the resource. You can safely ignore this part of the URL in your rewrite / controller routing magic / handling application, as the only function it has is to tell the browser that it has to request a new file and not use the old one anymore.

Examples of URL-schemes To Get Around Expires:-headers

flickr uses as simple .v in their URLs to indicate the version of the file: http://l.yimg.com/g/css/c_sets.css.v74709.14
On Gamer.no we use the current SVN revision: /css/main.css?v=1120M
vg.no uses the current date, followed with an identifier that probably indicates the current revision for that day: css/frontpage.css?20091203-1

It’s important to remember that the identifier is not used to deliver an older version of the file depending on the parameter, just to make the browser see the new resource. The old URL can still serve the new resource – and if you need to keep old versions around, you’ve probably solved this issue already.

Use a Separate Domain for Static Resources

By using another, separate domain for your static resources, you’re letting browsers fetch the static resources while they’re still processing your HTML. The HTTP/1.1 specification says that browsers never should request more than two files at the same time from the same domain. When you host your static resources on another domain, you tell the browser that it can go ahead and fetch those resources while being busy with downloading other items from your main site.

After you’ve moved your static resources to a separate domain, you’ll usually also end up using less bandwidth. Since you’re now delivering the most requested content from another host, cookies will not be included in the request from the browser. When a browser makes a request for a resource on a certain host, it includes all the cookies that have been set for that domain. This happens independent of which files it’s requesting, and if you have a large number of separate files (which you probably could include into one larger file – resulting in fewer HTTP requests), these Cookie-headers can add up to a significant amount of bandwidth. The HTTP server will also have less work to do, making everyone happier!

If you use www. as a prefix for all your regular HTTP requests and take care of setting your cookies in the www.example.com domain, you should be able to simply use something like static.example.com for your static content and avoid leaking cookies into the other subdomain. If you have loads of static content, you can also use several separate subdomains for your files, but be sure to let the request for a certain file point to the same subdomain each time – otherwise you’ll end up with the browser requesting four copies of the same, identical file and actually breaking the regular cache in the browser (which uses If-Modified-Since to tell the server when it last downloaded the file. We want to avoid the browser making the request again at all). At pwned.no I calculate the crc32 of the filename and use that value to determine which static host the request should use. We also redirect any requests directly to pwned.no to www.pwned.no to make the cookie structure consistent. We do however not set the Expires-header yet, but that might be a part of the next update to the site.

Do you have a particular caching strategy you use for client side content? What kind of URL format works best for you? Leave a comment!

Read all the articles in the Ready for 2010-series

Ready for 2010: Check Your Indexes

One of the many things you should try to keep a continuous watch for during the life of any of your applications are the performance of your SQL queries. You might be caching the hell out of your database layer, but some time you’ll have to hit the database server to retrieve data. And if that starts to happen often enough while you’re growing, you will see your SQL daemon taking up the largest part of your disk io and your CPU time. This might not be a problem for the load you’re seeing now, but could you handle a 10 fold increase in traffic? .. or how about 100x? (which, if I remember correctly, is what Google uses as the scale factor when developing applications)

Indexes Are Your Friend

During the christmas holiday I got around to taking a look at some of the queries running at one of my longest living, most active sites: pwned.no. Pwned is a tournament engine running on top of PHP and MySQL, containing about 40.000 tournaments, 450.000 matches and several other database structures. The site has performed well over the years and there hasn’t been any performance issues other than a few attempts at DoS-ing the site with TCP open requests (the last one during the holiday, actually).

Two weeks ago the server suddenly showed loads well above 30 – while it usually hovers around 0.3 – 0.4 at the busiest hours of the day. The reason? One of the previously less used functions of the tournament engine, using a group stage in your tournament, had suddenly become popular in at least one high traffic tournament. This part of the code had never been used much before, but when the traffic spike happened everything went bananas (B-A-N-A-N-A-S. Now that’s stuck in your head. No problem.) The reason: the query used a couple of columns in a WHERE-statement that wasn’t indexed, and the query ran against the table containing the matches for the tournament. This meant that over 400.000 rows were scanned each time the query ran, meaning that mysqld started hogging every resource it could. The Apache childs then had to wait, making the load a bit too much for my liking. Two CREATE INDEX-calls later the load went back down and everything chugged along nicely again.

My strategy for discovering queries that might need a better index scheme (or if “impossible”, a proper caching layer in front of it):

Run your development server with slow-query-log=1, log-queries-not-using-indexes=1 and long-query-time=<an appropriately low value, such as 0.05 – depends on your setup>. You can also provide a log file name with the log-slow-queries=/var/log/mysql/… in your my.cnf-file for MySQL. This will log all potential queries for optimizing to the log file (this will not necessarily provide you with a complete list of good queries to optimize, but it might provide a few good hints). Be sure to use actual data from your site when working on your development version, as you might start seeing issues when the size of the data set reaches a certain size – such as 400.000 rows in the example mentioned above)
Connect to your MySQL server and issue
```
SHOW PROCESSLIST
```
and
```
SHOW FULL PROCESSLIST
```
statements every now and then. This will let you see any queries that run often and way too long (but they’ll have to run when you issue the command). You might not catch the real culprit, but if you’re seing MySQL chugging along with 100% CPU and are wondering what’s happening, try to check out what the threads are doing. You’ll hopefully see just which query is wreaking havoc with your server.
Add a statistics layer in front of your MySQL calls in your application. If you’re using PDO you can subclass it to keep a bit of statistics about your queries around. The number of times each query is run, the time it took in total running the query and other interesting values. We’re using a version of this in the development version of Gamer.no and I’ll probably upload the class to my github repository as soon as I get a bit of free time in the new year.

Not sure what I’ll take a closer look at tomorrow, but hopefully I’ll decide before everything collapses!

What are your strategy for indexes? What methods do you use for finding queries that need a bit more love? Leave a comment below!

Read all the articles in the Ready for 2010-series

Ready for 2010: Upgrade Critical Software

You might remember than WordPress installation you did a couple of years ago and that you’ve been ignoring the “upgrade now” message for almost as long. This “Ready for 2010” message is sponsored by the “Get Your Software Up To Date” foundation.

Before starting down this road, it could be a good idea making sure your backups work. :-)

Yes. You should do these things all year around, but at least this should be a perfect occasion to take the time to check out that old server you installed just to test stuff at home, the server where you’re hosting your private blog, your mail server, etc. We geeks have some sort of weird ability to contract a couple of servers in strange places, forgetting them – but still using them for something on a daily basis.

Update and Upgrade Your Distribution

Spend a couple of hours getting the distribution up to date. For Linux-based servers this usually involves using the package systems update manager, such as apt-get or aptitude on debian or Ubuntu-based servers, up2date on Red Hat, the SUSE update manager etc. On Windows-based servers you’ll be running the update manager, getting all the recent fixes and patches into the core library of tools.

If you’re feeling a bit adventurous you should also consider upgrading your distribution to a newer version if one is available. This will make newer versions of the software you’re running available, get new features into your applications and other Good Things. It might however break a few existing features, such as the layout of certain configuration files, new default values for some settings and other, small stuff. Be sure to set aside a couple of hours for this, so don’t do this just as you’re leaving the country for a couple of months.

Find – and Upgrade – Installed Web Applications

It’s very important to keep Web applications you’ve installed, such as WordPress, up to date. As their nature makes them available through the internet, they’re often a preferred vector for automatic attacks against your server. As soon as a remote exploit has been found, you’ll start noticing attempts to break your server in your web server’s access logs. Usually the attacks require some sort of mitigating factor that requires a particular configuration, but there’s always someone who get in trouble because of just that factor. That might be you!

WordPress (and several other web applications) also contain an automagical update mode, where you can update the software simply by clicking a link in the admin interface. Be sure to spend half an hour getting the automagical update to work where it’s available, and do it now!

Upgrade Embedded Software

Our lives are filled with devices that run any sort of embedded software. Your mobile phone, your digital camera, your wireless router, your TV, your media player, your game consoles, your network attached storage disk (NAS), etc.

Check out the manufacturer’s website for the different devices (or if you’re a nerd like .. well, me, you might have exchanged the firmware with alternative firmwares) and check if there are any updates available. Several devices are also able to update themselves, so be sure to just log in to the device (and discover that you don’t remember the password you set a couple of years ago) and check if you can just click a button to make everything go Happy Happy Joy Joy.

You might also have several applications installed on your mobile phone – check for updates and any critical fixes. You do not want your mobile phone to leak private details out onto the world wide web or through Bluetooth.

Any other issues one should be sure to cover when doing this? Leave a comment below!

Read all the articles in the Ready for 2010-series

Ready for 2010: Check Your Backups

As usual, this is something you should to at least once each month or each week – go into your backups and check that they’re actually running and doing what they should be doing. If you think you’re doing backups – but you’re unable to restore from the backup, you’re not doing backups!

As Jeff Atwood can attest: DO NOT TRUST THAT OTHER PEOPLE ARE DOING YOUR BACKUPS – not even your hosting provider even if you pay them to do this for you. At least something good came out of that story – I think quite a few folks checked their own backups in the days after the disaster happened. The only person who really has any great interest in your backup working is YOU. Sure, people may lose your business because of ruining your backups (and they should have a great interest in making their backups work), but the person who’ll be greatly affect the loss of your backup will be you (and other people caught in the same shitstorm).

So here’s a couple of quick things you could do to maybe do things a bit better than last year:

Revisit all backup-scripts and run them manually to check that they actually work. Maybe your SSH-keys have expired (debian blacklisted SSH-keys generated with a bogus key generation function) (been there, done that), your SSH-daemon has switched to another port and you’ve not updated your scripts (been there, done that) or several other causes. Run them manully first, so that you’re catching any errors. It is not enough to think that cron (or Scheduled Tasks) would tell you if anything weird happened.
Make your backups travel out from the physical location of the server. This might be your house, a co-location facility, your own dedicated server room, etc., but it will never be guarded against an attack by the natural forces such as a large fire, a flood, a train or an airplane hitting the building. Get it out of the building, and get it out of the city. Purchase online backup storage if needed, it’s dirt cheap at the moment and will still be in the future. $5-$20 a month for keeping all your data safe? It’s not even worth to spend any time thinking about it. Get it running, NOW!
Check that there’s enough free disk space on the server you’re making your backups to. Doesn’t help if everything works today if you’re going to run out of space in 6 days. Make the server send you a notice through E-mail and SMS if it’s getting anyewhere close to 80% of total capacity. Install munin, nagios and/or cacti to keep track of your servers. I’ve become a fan of the easiness of munin, so if you haven’t taken a closer look at it, do it now.
Check that your warning systems when a backup fails actually work. You could create a small file in the location you’re backing up with random content and then checking the md5sums of the files after the backup job has run. Make the alarm bells go off when things doesn’t match.
Make a static backup on an USB drive. USB-drives have also become dirt cheap, get a 1TB one and make a backup right now. Store the drive somewhere else than your regular backups, such as your home. Run the backup each month. This will at least make you do the backup manually and might save your ass when you discover that your regular backup job didn’t backup what you thought it backed up.
Backup jobs only back up what you tell them to back up. If you’re not backing up a complete system (and be sure to check that the backup includes files that might be kept open by the system, such as the hard drive image files of virtualized servers), you might have installed new applications that contain user generated files, new database storage locations etc, during the year. Be sure to check that you actually back up the things that are worth something to you. Maybe your new image application stores it metadata in another location than your previous one (Lightroom vs Aperture perhaps). Check it.
Do NOT write an article about backup routines and backup suggestions. This will result in catastrophic hardrive failure and backups that don’t work. You WILL jinx yourself. Oh sod.

If you need a good suggestion for backup software, take a look at rdiff-backup.

I have probably forgotten something very important here, so I’m trusting you guys to be my backup (yeah, right). Leave a comment with any good advice!

Now go out and create a random file somewhere on the locations you’re backing up.

Read all the articles in the Ready for 2010-series.