Mats Lindh – Page 7 – Where desperate is just another word for a regular day.

Ubuntu Natty and Native Linux Spotify – “There is a problem with your sound card”

After updating to the most recent version of Ubuntu, The Natty Narwhal, Spotify decided it didn’t want to play music any longer. The only message in the UI itself was the usual “There is a problem with your sound card”, which isn’t very helpful if you want to actually try to find a solution.

Starting Spotify from the command line gave a few new messages, ending with:

E [snd:298] playbackError(12)

According to a a thread on getsatisfaction, this seems to be caused by a mismatch in different versions of the offline file cache (My naive guess is that Spotify uses the kernel identificator or some other settings about the machine in deciding the key to use for encrypting the offline storage – when this suddenly doesn’t match any longer, it refuses to play any music).

Closing Spotify, deleting the ~/.cache/spotify/offline.bnk file (~ expands to your home directory), solves the issue and allows Spotify to fill your ears yet again.

Update: This also seem to fix any issue where Spotify fails to get the external IP through upnp (at least that’s one of the error messages):

E [upnp:521] ip: error getting external ip 0
I [http:840] Result 404 Not Found

Creating / Generating a URL Friendly Snippet in Django

When generating URLs that say something about the content it contains, you usually have the need to create url-friendly strings (such as “creating-generating-a-url-friendly-snippet-in-django” for this post). Earlier today I had the need build something like that in Django, but came up empty handed for a while – simply because I didn’t realize that they’re called slugs in the Django documentation (which in turn inherited it from the print paper world). That helped quite a bit, and here’s the things you need to know:

If you want your model to perform validation and automagically add a database index for the slug, put in a SlugField in your model. You could use a regular CharField, but this will not provide any validation of the field in the admin interface (and you’d have to add db_index yourself). This will however not populate the field with a value, but this stack overflow question has a few pointers on how you could implement that. I’ll cite the interesting parts here:

from django.template.defaultfilters import slugify

class test(models.Model):
    q = models.CharField(max_length=30)
    s = models.SlugField()

    def save(self, *args, **kwargs):
        if not self.id:
            self.s = slugify(self.q)

        super(test, self).save(*args, **kwargs)

This will populate the slug field the first time the entry is saved, and will not update the slug value if the object itself is updated.

You can however also handle this yourself in the get_absolute_url of your object, if you don’t need to perform any lookups on the value (although you could still save it to avoid regenerating it later). This will also give you the current slug for the entry (again, if you only use it as a part of the URL and do not validate it in any way):

from django.template.defaultfilters import slugify

def test(models.Model):
    q = models.CharField(max_length=30)

    @models.permalink
    def get_absolute_url(self):
        return ('reviewerer.views.sites_detail', [str(self.id), slugify(self.q)])

Hopefully that will help you solve your issue .. or at least search for the correct term next time :-)

Building Varnish on RedHat Enterprise Linux 4.0 (RHEL4)

We’re switching to Varnish as our reverse proxy service (and leaving mod_proxy under Apache), and as our web frontends still run RHEL4, there was a few small things that crept up while attempting to build Varnish from source (the supplied RPM packages are only for RHEL5).

You’ll need to have the pcre packages installed:

pcre pcre-devel

When running configure you might still get a notice about pcre not being available:

checking for PCRE... no
configure: error: Package requirements (libpcre) were not met:

Package libpcre was not found in the pkg-config search path.
Perhaps you should add the directory containing `libpcre.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libpcre' found

Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables PCRE_CFLAGS
and PCRE_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.

You can solve this by setting the PCRE_CFLAGS and PCRE_LIBS environment variables:

export PCRE_LIBS="-L/usr/lib -lpcre"
export PCRE_CFLAGS=-I/usr/include/pcre

Re-run configure and run make, and things should hopefully build properly.

Parse a DSN string in Python

A simple hack to get the different parts of a DSN string (which are used in PDO in PHP):

def parse_dsn(dsn):
    m = re.search("([a-zA-Z0-9]+):(.*)", dsn)
    values = {}
    
    if (m and m.group(1) and m.group(2)):
        values['driver'] = m.group(1)
        m_options = re.findall("([a-zA-Z0-9]+)=([a-zA-Z0-9]+)", m.group(2))
        
        for pair in m_options:
            values[pair[0]] = pair[1]

    return values

The returned dictionary contains one entry for each of the entries in the DSN.

Update: helge also submitted a simplified version of the above:

driver, rest = dsn.split(':', 1)
values = dict(re.findall('(\w+)=(\w+)', rest), driver=driver)

btinfohash – A Small Python Utility for Getting an info_hash

While attempting to add torrents to a closed tracker I needed the info_hash structure for a given torrent file, as this is what the tracker uses to identify a given torrent file. I tried searching the regular sources but came up empty (I later discovered that btshowmetainfo from the original python bittorrent client package also shows the calculated info_hash for a torrent file), so I wrote a simple python script for all your shell scripting or command line needs.

#!/usr/bin/python
import bencode, BTL, hashlib, sys

if (len(sys.argv) < 2):
    sys.stderr.write("Usage: " + sys.argv[0] + " \n")
    sys.exit()

try:
    with open(sys.argv[1], "rb") as torrent_file:
        try:
            bstruct = bencode.bdecode(torrent_file.read())

            if 'info' in bstruct:
                print(hashlib.sha1(bencode.bencode(bstruct['info'])).hexdigest())
            else:
                sys.stderr.write("Did not find an info element in the torrent file.\n")
                sys.exit()
        except BTL.BTFailure:
          sys.stderr.write("Torrent file did not contain a valid bencoded structure.\n")
          sys.exit()
except IOError as e:
    sys.stderr.write("Could not open file: " + str(e) + "\n")
    sys.exit()

This depends on the bencode.py and BTL.py from the bencode python package (their .eggs are borked, so you’ll have to copy the two .py files yourself after downloading the source file).

up2date: The package .. is signed, but with an unknown GPG key.

After attempting to install libevent-devel on one of our cluster nodes, up2date started complaining about missing GPG signatures. We use the Fedora EPEL repository for fairly new items, and apparently the key were missing / not updated.

The error messages:

warning: rpmts_HdrFromFdno: V3 DSA signature: NOKEY, key ID 217521f6
The package .. is signed, but with an unknown GPG key. Aborting...
Package .. has a unknown GPG signature. Aborting...

The fix for us was to download the keys from Fedora’s key page (we just needed the EPEL-key), and add them with rpm:

wget https://fedoraproject.org/static/217521F6.txt
rpm --import 217521F6.txt

.. and things worked as it should yet again – and we still get our packages verified. You can also confirm that the key id from the error message (217521F6) is the same as the key that you’ve downloaded from Fedora (or if you’re using another repository they’ll probably provide their keys as well).

Solr, Memory Usage and Dynamic Fields

One of the many great things about Solr is that it allows you to add dynamic fields – you can define a certain pattern that a field will have to follow, but it can then use any field name that matches the pattern.

We’ve been using one such dynamic field to add a sort field for our documents:

xxx_Category_Subcategory: 300

This would allow us to sort by this field to get the priority of our documents in this particular category and subcategory. A document would contain somewhere between 1 and 15 such fields. The total selection of unique field names is somewhere around 1200 across all documents.

Be small, be happy

As long as our collection were quite small (<10k documents) this scheme worked great. When our collection grew to around 500k documents, we started seeing out of memory errors quite often. At the worst rate we got an out of memory exception every 30 minutes, and had to restart the Solr server. Performance didn't suffer, but obviously we couldn't continue restarting servers until we got bored. After removing a few other possible issues (such as our stable random sort) I were rather stumped that things didn't improve. The total amount of data in our dynamic fields were rather low, somewhere around 2.5 - 3.5m integers, or possibly somewhere around 50-70MB in total. The JVM should be able to fit everything about these fields in memory and query them for the fields we're trying to find, but a heap dump of the jvm just before it hit the out of memory exception revealed that we were getting quite a few GBs of Lucene's FieldCache objects. These objects cache the value of a field for the total set of documents available in the index, and you're sadly not able to tune this cache through the Solr configuration (at least not for 1.4 as far as I could find).

Less Dynamic Fields, More Manual Labor

After pondering this issue a bit I came to the conclusion that our problem was related to the dynamic fields we had, and the fact that we used them for sorting. Lucene / Solr keeps one set of field caches for each field when it’s used for sorting, to avoid having to do duplicate work later. For us, this meant that each time we sorted a new field, an array had to be created with the size of the total document set. As long as we just had 10k documents, these arrays were small enough that we had enough memory available – when the document set grew to almost 500k documents, not so much.

This means that the total memory required for field caches will be limited by DocumentsInIndex * FieldsSortedBy. As long as our DocumentsInIndex were just 10k, the available memory to the jvm was enough to keep sorting by the number of fields we did. When the number of documents grew, the memory usage grew by the same factor and we got our OutOfMemoryException.

The Solution

Our solution could probably be more elegant, but currently we’ve moved the sorting to our application layer instead of the data provider layer. We’re requesting the complete set of hits from the Solr-server in the category anyway, so we’re able to sort it in the application – and by using a response format other than XML we’re also doing it rather quickly. This means that we’re not using sorting at all, and are only querying against one multivalued field to see if the category key is present there at all.

Note: Other solutions we considered were to divide our index into several Solr cores. This would allow us to keep the number of documents in each core low, and therefor also keep the fieldcache size in check. We know that each category could very well live on just on core as we won’t be mixing it with data from the other cores (and for that we could keep a separate core with all the documents, just not use it for searching across dynamic fields). We dropped this plan because of the rather worrying increase in complexity in our Solr installation. This could however help in your own case. :-)

Introducing Mismi – Amazon Price Comparison for Norwegian Customers

My main project in December was Mismi – a service that compares the total price of items from Amazon.com and from Amazon.co.uk for Norwegians. The solution is built on top of the Zend_Service_Amazon class (with a few extensions of my own).

The reasoning behind making the service is that there are several factors that are in play when deciding whether to order a product from the US or from the UK: the exchange rate for GBP and USD, the shipping cost, the delivery situation for the item and whether the item is sold in the store at all.

The user enters a list of the URLs to the products they’re considering purchasing from an Amazon-store, press submit and get a list back of which items are in stock, where the item is the cheapest and what the total sum of an order placed at the store would be. In addition I added a alpha stage feature just before Christmas which will also tell you the “optimum” set of items for the orders – “order item 1,4,7,9 from .com, item 2,3,5,6,8 from .co.uk”. This took quite a bit of hacking – you also have to consider the initial price of shipping, shipping for each item and other fun things.

Feel free to play with it over at mismi.e-mats.org. It’s in Norwegian, but it should be easy to understand anyhow with the description above.

mod_jk and Internal Server Error (HTTP 500)

We’ve extended our previously single Solr-node to a few nodes in a cluster. This allows us to run queries against one node while updating or configuring another, distributing the load across several servers (although we’re not there yet load wise) and being able to handle any out of memory or other critical errors.

While Solr supports querying several cores or distributing the queries internally, we decided to move the load balancing and handling of failed nodes higher up in the hierarchy. We’re now doing simple load balancing and handling of failed nodes by using mod_jk in our existing Apache-based environment. mod_jk also handles failed servers without any administrator interaction. We were already using mod_jk for our main web frontend, and since we use Tomcat as our application container for Solr, things should be a breeze!

Well, no. After copying our existing mod_jk setup, configuring our new workers and restarting Apache, all I got was the well known 500 INTERNAL SERVER ERROR. Here’s the worker configuration file:

worker.list=loadbalancer,status

worker.solr1.port=8009
worker.solr1.host=10.0.0.4
worker.solr1.type=ajp13
worker.solr1.lbfactor=1
worker.solr1.cachesize=10

worker.solr2.port=8009
worker.solr2.host=10.0.0.5
worker.solr2.type=ajp13
worker.solr2.lbfactor=4
worker.solr2.cachesize=10

worker.loadbalancer.type=lb
worker.loadbalancer.balance_workers=solr1,solr2
worker.loadbalancer.sticky_session=0

worker.status.type=status

This provides us with two solr servers and one status worker (the status worker is responsible for providing a simple web interface for enabling/disabling/seeing the status of the other workers), configured with a 1:4 load balancing (the second server has quite a bit more memory available for Solr).

I provided the configuration of the workers through the JkWorkersFile configuration setting (in a VirtualHost block… don’t do that):

JkWorkersFile conf/workers.properties

I’d also enable debug logging to attempt to find the problem (still in a VirtualHost block):

JkLogFile logs/mod_jk.log
JkLogLevel debug
JkLogStampFormat "[%a %b %d %H:%M:%S %Y]"

Other mod_jk settings (in the VirtualHost block) were:

JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories
JkRequestLogFormat "%w %V %T"
JkShmFile logs/jk.shm
JkMount /* loadbalancer

<Location /jkstatus>
	JkMount status
	Order deny,allow
        Deny from all
        Allow from 127.0.0.1
</Location>

Still no solution. Peeking at the log files mod_jk provided, I were able to deduce the following:

[debug] map_uri_to_worker::jk_uri_worker_map.c (525): Attempting to map context URI '/jkstatus'
[debug] map_uri_to_worker::jk_uri_worker_map.c (550): Found an exact match status -> /jkstatus
[debug] jk_handler::mod_jk.c (1920): Into handler jakarta-servlet worker=status r->proxyreq=0
[debug] wc_get_worker_for_name::jk_worker.c (111): did not find a worker status
[info]  jk_handler::mod_jk.c (2071): Could not find a worker for worker name=status

This indicates that mod_jk was unable to find a worker matching the name I provided in the JkMount statement above; status. Weird. I added some garbage characters to the “JkWorkersFile” setting, and Apache complained that it were unable to find the workers file. Changed it back, reloaded, and still nothing. It was apparently unable to find the worker. The map did however work, as it tried to launch a worker.

Looking back at the start up sequence of mod_jk, the following were found in the log:

[debug] build_worker_map::jk_worker.c (236): creating worker ajp13
[debug] wc_create_worker::jk_worker.c (141): about to create instance ajp13 of ajp13
[debug] wc_create_worker::jk_worker.c (154): about to validate and init ajp13
[debug] ajp_validate::jk_ajp_common.c (1922): worker ajp13 contact is 'localhost:8009'
[debug] ajp_init::jk_ajp_common.c (2047): setting endpoint options:
[debug] ajp_init::jk_ajp_common.c (2050): keepalive:        0
[debug] ajp_init::jk_ajp_common.c (2054): timeout:          -1
[debug] ajp_init::jk_ajp_common.c (2058): buffer size:      0
ajp_init::jk_ajp_common.c (2062): pool timeout:     0
[debug] ajp_init::jk_ajp_common.c (2066): connect timeout:  0
[debug] ajp_init::jk_ajp_common.c (2070): reply timeout:    0
[debug] ajp_init::jk_ajp_common.c (2074): prepost timeout:  0
[debug] ajp_init::jk_ajp_common.c (2078): recovery options: 0
[debug] ajp_init::jk_ajp_common.c (2082): retries:          2
[debug] ajp_init::jk_ajp_common.c (2086): max packet size:  8192
[debug] ajp_create_endpoint_cache::jk_ajp_common.c (1959): setting connection pool size to 1 with min 0

It took a bit of time, but I realized that this tells me that mod_jk created _a default_ worker named ajp13. Apparently it was not reading my worker file at all, but it still complained if I changed the file name. You’d think that the setting which loads the configuration file would work when it complains when it doesn’t. But .. well. After an hour of attempting to find out why the workers didn’t load, revising the workers file to a minimal example, trying with just a single status worker, I concluded that my workers file was correct, and obviously mod_jk found it when it attempted to load it.

Then I suddenly discovered the small notice in the mod_jk configuration manual:

JkWorkersFile: This directive is only allowed once. It must be put into the global part of the configuration.

JkWorkersFile can not be defined in a <VirtualHost> section. It will NOT complain if you do it, it’ll just never define any workers. It will complain if the file doesn’t exist, even if it never tries to actually load it.

Confusing.

Moving the JkWorkersFile statement out from the <VirtualHost> block and to the LoadModule statement instead solved the issue. This is also the case for JkWorkerProperty.

gnome-web-photo, Ubuntu and “Could not get gre path!”

Another issue I came across in my current adventures in gnome-web-photo land was that gnome-web-photo just stopped running after upgrading up through four different Ubuntu releases. gnome-web-photo crapped out with an error message about the GRE path being wrong, and the only resources on the web were commit messages from when the error message was changed and committed to the gnome-web-photo project.

Digging, digging and a bit more digging led me to suspecting that xulrunner wasn’t behaving as it should. stracing the gnome-web-photo binary (named gnome-web-photo.real on Ubuntu, as gnome-web-photo is a simple wrapper script) revealed that it was attempting to load a difference version of xulrunner than what actually were present (it tried fstat-ing a non-existant directory).

Checking out the current configuration of gre in /etc/gre.d (.. where I’ve never ventured before) indicated that DPKG had left the new package configuration without switching it to the standard configuration. This meant that the old GRE configuration was used, and not the new one. Renaming the old configuration file to gibberish and then removing the suffix from the new one solved the issue.

Removed: 1.9.1.14.system.conf
Renamed: 1.9.2.11.system.conf.dpkg-bak to 1.9.2.11.system.conf

.. and now gnome-web-photo actually works again!