Making Solr Requests with urllib2 in Python

When making XML requests to Solr (A fulltext document search engine) for indexing, committing, updating or deleting documents, the request is submitted as an HTTP POST containg an XML document to the server. urllib2 supports submitting POST data by using the second parameter to the urlopen() call:

f = urllib2.urlopen("http://example.com/", "key=value")

The first attempt involved simply adding the XML data as the second parameter, but that made the Solr Webapp return a “400 – Bad Request” error. The reason for Solr barfing is that the urlopen() function sets the Content-Type to application/x-www-form-urlencoded. We can solve this by changing the Content-Type header:

solrReq = urllib2.Request(updateURL, '')
solrReq.add_header("Content-Type", "text/xml")
solrPoster = urllib2.urlopen(solrReq)
response = solrPoster.read()
solrPoster.close()

Other XML-based Solr requests, such as adding and removing documents from the index, will also work by changing the Content-Type header.

The same code will also allow you to use urllib to submit SOAP, XML-RPC-requests and use other protocols that require you to change the complete POST body of the request.

3 thoughts on “Making Solr Requests with urllib2 in Python”

  1. Be aware that urllib2 may automatically add another Content-Type header which will override the one you set explicitly. This wildly awful behaviour doesn’t seem to trouble the python community at all. Can has explicit? Noes? FAIL.

  2. if you put &wt=python on the end of the url then solr will return a python object instead of xml

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>