Uncategorized – Page 2

Merry Christmas and Happy Holidays!

Tomorrow is Christmas Eve, the day that we usually celebrate christmas on in Norway. I’ll spend the first four hours of the day at work selling Hi-Fi (as I’ve been doing the last ten days (each single one of them)) at my brother’s store, before visiting my family for the christmas dinner and exchanging of gifts.

I plan spending the rest of the time until returning for work on the 5th of january hacking on some old code for pwned.no and a few other relevant projects.

Anyways, just wanted to wish all readers stumbling across this page all the best for the year to come, and please, stop the holocaust! (on a side note, our christmas tree even got a name this year: Svein. Might not be a good idea when we’re supposed to throw it out, but it could turn out to be a new tradition..)

The First Rule of Scalability

Don’t do slow shit often.

Following in the path of the previous Golden Rule of Frameworks, I give you the first rule of scalability.

What. Four. The number.

Kristian obviously has way too little to do while in Newcastle, UK, so he got challenged by his girlfriend to write a list of four items on several key subjects. And he challenged me to do the same. Bastard.

Name 4 jobs I have had:
- CTO at Derdubor
- Owner of Lindh Utvikling
- Research Assistant at HIOF
- Orakel at NTNU
4 movies I could watch again:
- Sen to Chihiro no kamikakushi (possibly one of the most beautiful movies ever made)
- Memento
- The Dark Knight
- Toy Story 2
Name 4 places I have lived:
- Trondheim
- Fredrikstad
- .. and that’s that.
Name four tv shows I like:
- West Wing
- South Park
- Numb3rs
- Bones
Four places I have been on vacation:
- Hartford, CT, USA
- Orlando, FL, USA
- Jylland, Denmark
- London, UK
Four web sites I visit every day:
- Google Reader
- VG
- Dagbladet
- This blog

As I’m a cheerful fellow, I’m not going to challenge anyone. But I’ll leave a brief hint that it’s about time that Ole posts something in his blog again.

(Bonus point: The first time the category “Uncategorized” has been used with intention on this blog. This really is uncategorized.)

String Metrics

«Estimate» stumbled across this awesome page with different string metric algorithms earlier today. Here you’ll find descriptions and implementations of Hamming distance, Levenshtein distance, Needleman-Wunch distance, Smith-Waterman distance and dozens other. Invaluable if you’re ever going to need to compare strings against each other and need some way to measure their similiarity.

New Times Ahead, Baby!

As the most observant people out there probably have noticed, I’ve given the site a little face lift to bring it into the next century (so bring it on, 2100!!!11). The illustration was done by the very talented Anette Heiberg – Children’s Book Illustrator – which also is the one single person that manages to live together with me. A neat little coincidence there!

Anyways, the new design is dark, but I’ve decided to use the inverse header for each post as it makes visually scanning the page with your eyes to find the items _very_ effective. I like it, so it stays.

Happy Happy Joy Joy!

Christer and His Quest For More Zend_Form-age

I finally found out why Christer had been so quiet all day: he’s obviously been writing the largest post seen in the history of blogs. His introduction to Translating Zend Form Error Messages is enormous and a giant of a beast, and will give a thorough introduction to the concept of using Zend_Translate together with Zend_Form to use resource files to present an user interface in several localized versions.

Solr: Using the dismax Query Handler and Still Limit a Specific Field

While working with the facets for our search result earlier today, I came across the need to limit the search against solr on one specific field in addition to our regular search string (which we run against several fields with different weights). The situation was something like this:

Lastname
AggregateSearchField
AggregatePhoneticSearchField

We do the searches against the AggregateSearchField and the AggregatePhoneticSearchField, where we weight the exact match higher than the phonetic matches. This ensure that the more specific matches are ranked higher than those that are merely similiar. We do this for several different field groupings, but that’s not revelant for this post, so let’s just assume that these are the three fields relevant. We search against two of them, and uses Lastname as a facet / navigator field to allow users to get more specific with their search.

However, while users should be allowed to get more specific with their search when selecting one of the facets, it should not change their regular search. And since the dismax handler will search through all the allowed field for a given value, you cannot just append Lastname:facetValue to the search string and be done with it (dismax does not support fielded searches through the regular query). After a bit of searching through our friends over at Google, I finally stumbled across the solution (which I of course should have seen on the Solr wiki): use the fq-parameter. This allows you to submit a “Filter Query” in your request, which will be used to further filter your existing query through another set of queries. This fits very neatly in with keeping your original query and then appending filter queries for each facet limitation that gets set.

A small code example for Solrj: (filterQueries is a HashMap<String, String> which contains the facets; filterQueries.put(“Lastname”, “Smith”) will add a limitation on the field “Lastname” being “Smith” (you might want to escape “-s in the facet values)):

if (filterQueries != null)
{
    for (String q : filterQueries.keySet())
    {
        String value = filterQueries.get(q);
        query.addFilterQuery(q + ":\"" + value + "\""); // this adds Lastname:"Smith" as a filter query
    }
}

So now we can just parse the query string for valid facet limitations, and set the fields in the filterQueries HashMap accordingly. As we already have a list of facet fields to include, this is a simple as iterating that list and checking for the parameters in the request variables.

A great thank you to Mike Klaas in the dismax and Plone thread indexed by nabble.com that sent me in the right direction.

Handling Large Datasets at Google

High Scalability has a neat post today highlighting a recent presentation given by Jeff Dean from Google at this year’s Data-Intensive Computing Symposium. The presentation named “Handling Large Datasets at Google: Current Systems and Future Directions” (video (hosted by Yahoo!) ) (slides) dips into quite an amount of issues and thoughts about what it takes to run something handling petabytes of data. the video of the presentation

I’ll leave you with a quite interesting list shown in slide 8 (of 58) under the title of “Typical first year for a new cluster“:

~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
~1 network rewiring (rolling ~5% of machines down over 2-day span)
~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
~5 racks go wonky (40-80 machines see 50% packet loss)
~8 network maintenances (4 might cause ~30-minute random connectivity losses)
~12 router reloads (takes out DNS and external vips for a couple minutes)
~3 router failures (have to immediately pull traffic for an hour)
~dozens of minor 30-second blips for dns
~1000 individual machine failures
~thousands of hard drive failures

The Importance of The Double-Click Time

Raymond Chen’s blog “The Old New Thing” is an invaluable source of interesting theories and histories about the inner workings of Windows. If you haven’t read his book “The Old New Thing: Practical Development Throughout the Evolution of Windows” yet, add it to your wishlist now. Although some parts of it can be a bit too much code and internals, the stories and the appendices are simply awesome. Well worth it.

But this post wasn’t supposed to be about that, so I’ll leave you with what I intended to write about instead; the recently posted entry about how several different values are derived from the double click time setting.

Typesetting on the World Wide Web

The awesome people over at Smashing Magazine has a neat article up today about 5 principles and ideas for setting type on the web. While I do not agree with the usability concept of some of the examples (in particular, the first and last example in section 4 is painful to watch), the article is informative and presents quite a few issues and good tips about typography and the web. Keep a bookmark available for the next time you’re sketching up a new site (.. which I’ll have to do with the design around here soon ..).