When making XML requests to Solr (A fulltext document search engine) for indexing, committing, updating or deleting documents, the request is submitted as an HTTP POST containg an XML document to the server. urllib2 supports submitting POST data by using the second parameter to the urlopen() call:
f = urllib2.urlopen("http://example.com/", "key=value")
The first attempt involved simply adding the XML data as the second parameter, but that made the Solr Webapp return a “400 – Bad Request” error. The reason for Solr barfing is that the urlopen() function sets the Content-Type
to application/x-www-form-urlencoded
. We can solve this by changing the Content-Type
header:
solrReq = urllib2.Request(updateURL, ' ')
solrReq.add_header("Content-Type", "text/xml")
solrPoster = urllib2.urlopen(solrReq)
response = solrPoster.read()
solrPoster.close()
Other XML-based Solr requests, such as adding and removing documents from the index, will also work by changing the Content-Type
header.
The same code will also allow you to use urllib to submit SOAP, XML-RPC-requests and use other protocols that require you to change the complete POST body of the request.
Be aware that urllib2 may automatically add another Content-Type header which will override the one you set explicitly. This wildly awful behaviour doesn’t seem to trouble the python community at all. Can has explicit? Noes? FAIL.
if you put &wt=python on the end of the url then solr will return a python object instead of xml
Walking in the psernece of giants here. Cool thinking all around!