How To Make Solr Go 45% Faster

If you’re still looking for a good reason to spend a few minutes tuning your SOLR caches (documentCache, filterCache and queryResultCache), I’ll give you two numbers:

avgTimePerRequest : 126.148822
avgTimePerRequest : 70.026436 

The first is with the default cache settings, the latter is with a very small change. Yep. That’s a 45% speed increase. So, the interesting question is what Iactually changed in the cache configuration – although I should warn you, the answer is very, very, very complicated:

The cache size. The default size (at least for our current 1.3 installation) is to keep 512 elements in the cache. When someone on the solr-user list asked for an introduction to what the different cache statistics meant, I remembered that I hadn’t actually tweaked the settings at all. The SOLR server has been running for a year now, so we now have a quite good idea of how it will perform and what kind of queries we are seeing. The stats indicated that a lot more cached entries got evicted than what I were hoping to see, and this gave us a lower cache hit rate (about 50%).

The simple change was to increase the size of the cache (from 512 to 16384), so that we’re able to keep more documents in memory before evicting them. After running 24 hours with the new setup we’re now seeing cache hits as 99%, 68% and 67%. The relevant sections of the solrconfig.xml file are:

The document cache fills about 4 times as fast as the filter cache, so we might have to tweak the settings further by suiting it even better to our load pattern.

So what now?

The next step would be to try to change to the FastLRUCache which is included with Solr 1.4 (currently in SVN and nightlies). If my memory serves me right the changes are mostly related to locking, so I’m not sure if we’ll see any significant better performance.

We’ll also make further adjustments to the size of each of the caches to better match our usage.

Solr Becoming Slow After a While

This is perhaps the most obvious and “not very helpful” post for quite a few people, but for those who experience this issue, it’ll save the day. While doing a test index routine of around 6 million documents, things would get really slow at the moment I passed 1 million documents in the index. Weird. Optimizing didn’t help, as it died with an exception after a while.

The reason?

Not enough free disk space. Solr was indexing to a different partition than I thought.

Solved everything.

Shell Script For Submitting Documents to Solr

Here’s a small shell script I’m using to submit pre-made XML documents to Solr. The documents are usually produce by some other program, before being submitted to the Solr server. This way we submit all the files in an active directory to the server (here all the files in the documents directory (relative to the location of the script) will be submitted) .

You’ll have to update the URL and the directory (documents) below. We usually group together 1.000 documents in a single file, so the commit happens for every thousand documents. If you use autocommit in Solr, you can remove that line. This script requires CURL to talk to the Solr server.

cd documents || exit

for i in $( ls ); do
    cat $i | curl -X POST -H 'Content-Type: text/xml' -d @- $URL
    curl $URL -H "Content-Type: text/xml" --data-binary ''
    echo item: $i