One of the finals steps in my current Solr adventure was to make it possible to remove a large number of documents form the index at the same time. As we’re currently using Solr to store phone information, we may have to remove several thousand records in one large update. The examples on the Solr Wiki shows how to remove one single document by posting a simple XML-document, or remove something by query. I would rather avoid beating our solr server with 300k of single delete requests, so I tried the obvious tactics with submitting several id’s in one document, making several <delete>-elements in one document etc, but nothing worked as I wanted it to.
After a bit of searching and stumbling around with Google, I finally found this very useful tip from Erik Hatcher. The clue is to simply rewrite the delete request as a delete by query, and then submit all the id’s to be removed as a simple OR query. On our development machine, Solr removed 1000 documents in somewhere around 900ms. Needless to say, that’s more than fast enough and solved our problem.
To sum it up; write a delete-by-query-statement as:
id:(123123 OR 13371337 OR 42424242 .. )
Thanks intarwebs!