Sorting Strings as Numeric Values in Python3

March 31st, 2013

A small hack to do natural, numeric sort of string values in Python is to use the int function when calling sorted (here, applied to a dictionary get it sorted by its keys):

  1.     for position in sorted(positions.keys(), key=int):

This will call int() for each value in the list to sorted, and use the numeric value instead of the asciivalue (if the keys / elements are strings instead of numbers).

SQLAlchemy, MySQL and UTF-8

January 25th, 2013

While SQLAlchemy uses UTF-8 by default, the charset used when communicating with MySQL will affect the encoding of the returned data. To be sure that everything is handled properly as UTF-8 (which you might use SET NAMES 'utf8' in the console (don’t do that here..)), add ?charset=utf8 to your connection url:

  1. mysql://user:password@localhost/database?charset=utf8

Thanks to RustyFluff at StackOverflow.

Debugging Python’s Memory Usage with Dowser

January 24th, 2013

As I mentioned in my previous post, I had to hunt down a leak (which was intentional considering the functionality) somewhere in a batch import task in my Pyramid app. I’ve never played around with any memory profilers in python before, so this was a proper opportunity to see what the different options were. StackOverflow to the rescue as usual, with a handful of suggestions for Python memory profilers.

After trying a few, I ended up with Dowser. Dowser fit my use case neatly, as my application was a long running process, was console based (since it uses cherrypy to launch its own HTTP Server, it was a good thing that it didn’t conflict with any existing serv er) and I could pause it at a proper location before it consumed too much memory (a time.sleep(largevaluehere) worked nicely, thank you).

Installing Dowser was relatively pain free (a few of the other options I tried either needed custom patches, or required the process to run all the way through before giving me the information I needed).

I needed to get a few dependencies installed:

  1. pip install pil

.. which Dowser uses to generate sparkline diagrams, and cherrypy itself:

  1. easy_install cherrypy

.. and last, checking out the latest version of Dowser from SVN:

  1. svn co http://svn.aminus.net/misc/dowser dowser

I modified the example from the Stack Overflow question above a bit, and ended up with a small helper function in the application’s helper library:

  1. def launch_memory_usage_server(port = 8080):
  2.     import cherrypy
  3.     import dowser
  4.  
  5.     cherrypy.tree.mount(dowser.Root())
  6.     cherrypy.config.update({
  7.         'environment': 'embedded',
  8.         'server.socket_port': port
  9.     })
  10.    
  11.     cherrypy.engine.start()

Then doing launch_memory_usage_server() somewhere early in my code launched the HTTP interface (http://localhost:8080/) to see memory usage while the import process was running. This helped me narrow down where the issue showed up (as we were leaking MySQLdb cursors at an alarming rate), and digging deeper into the structure hinted to the underlying cause (the debug toolbar was active for a console script).

Leaking Memory / Cursors with SQLAlchemy and Pyramid

January 24th, 2013

After spending the better part of the day trying to find out why the fsck my console script for importing a dataset through sqlalchemy needed just above 7GBs of memory before barfing out and swapping like a madman, I finally found the solution.

Make sure that Pyramid’s debug toolbar is disabled. It’ll keep an reference around to all queries ran through SQLAlchemy (for .. well, debugging purposes, obviously). This causes an issue if you’re running a very large number of queries, and you’re not going to use the debug toolbar from the console anyway, so .. get rid of it.

I created a second version of my development.ini, a development_console.ini that doesn’t load the debug toolbar, and finally stuff Just Worked ™ again.

Console / shell script runs twice when using pyramid.paster and bootstrapping

January 8th, 2013

You’ve just created your new, exceptional shell script to maintain your pyramid application from the console, when you discover that everything runs twice. Usually not a very good idea, but .. why?

The issue is probably that you’re running pyramid with the scan option, which requires all modules in the path of your pyramid application to be imported. This will also import your console script, and if you haven’t placed everything into a function and added a check to see if the script has been invoked directly, you’re fscked!

The easy way out is to put your code into a function:

  1. def main():
  2.     from pyramid.paster import bootstrap
  3.     env = bootstrap('../../development.ini')
  4.  
  5. [snip]
  6.  
  7. if __name__ == '__main__':  
  8.     main()

The last if-test check if this is the main file that has been invoked, and if true, calls main and launches your script. This should hopefully solve the issue!

Pyramid: pkg_resources.DistributionNotFound: <projectname>

December 10th, 2012

When trying to start the built-in WSGI server in Pyramid after starting a new project, pserve refused to do anything useful with my project. Turns out I had forgot to run setup.py in my projects virtual environment to set up all the dependencies:

From my projects folder:

    ../bin/python setup.py develop

.. of course, you’ll remember this if you read the README.txt that the pyramid setup creates for you in your project directory.

Python, httplib and Empty Content for 200/201 Responses

October 27th, 2011

While hacking together a client for Imbo in python, I weren’t able to read the response from a connection initiated with httplib. If the request errored out (http response code 400/403/404) everything worked as it should, but if the response code were 200 / 201, the response read from the httplib connection was empty (read by using getresponse()).

Turns out the issue was related to calling close on the connection before reading the response. This apparently works if there’s an error (which means that the response should be rather small), but not if there’s a regular “OK” response from the server (it’s not enough just retrieving the HTTPResponse object, you have to call read() on it before closing the connection).

  1. connection.request(method, path, data)
  2. data = connection.getresponse().read()
  3. connection.close()

(Compared to the previous solution which retrieve the HTTPResponse object, closed the connection and then read the response)

Parse a DSN string in Python

January 31st, 2011

A simple hack to get the different parts of a DSN string (which are used in PDO in PHP):

  1. def parse_dsn(dsn):
  2.     m = re.search("([a-zA-Z0-9]+):(.*)", dsn)
  3.     values = {}
  4.    
  5.     if (m and m.group(1) and m.group(2)):
  6.         values['driver'] = m.group(1)
  7.         m_options = re.findall("([a-zA-Z0-9]+)=([a-zA-Z0-9]+)", m.group(2))
  8.        
  9.         for pair in m_options:
  10.             values[pair[0]] = pair[1]
  11.  
  12.     return values

The returned dictionary contains one entry for each of the entries in the DSN.

Update: helge also submitted a simplified version of the above:

  1. driver, rest = dsn.split(':', 1)
  2. values = dict(re.findall('(\w+)=(\w+)', rest), driver=driver)

Fixing dpkg / apt-get Problem With Python2.6

February 7th, 2010

While trying to upgrade to Python 2.6 on one of my development machines tonight I was faced by an error message after running apt-get install python2.6:

After unpacking 0B of additional disk space will be used.
Setting up python2.6-minimal (2.6.4-4) ...
Linking and byte-compiling packages for runtime python2.6...
pycentral: pycentral rtinstall: installed runtime python2.6 not found
pycentral rtinstall: installed runtime python2.6 not found
dpkg: error processing python2.6-minimal (--configure):
 subprocess post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of python2.6:
 python2.6 depends on python2.6-minimal (= 2.6.4-4); however:
  Package python2.6-minimal is not configured yet.
dpkg: error processing python2.6 (--configure):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 python2.6-minimal
 python2.6
E: Sub-process /usr/bin/dpkg returned an error code (1)

Attempting to install python2.6-minimal wouldn’t work, attempting to install python2.6 proved to have the same problem.

Luckily the Launchpad thread for python-central provided the answer: Upgrade python-central first!

:~# apt-get install python-central
[snip]
Setting up python2.6 (2.6.4-4) ...
Setting up python-central (0.6.14+nmu2) ...
:~#

Making Solr Requests with urllib2 in Python

May 30th, 2009

When making XML requests to Solr (A fulltext document search engine) for indexing, committing, updating or deleting documents, the request is submitted as an HTTP POST containg an XML document to the server. urllib2 supports submitting POST data by using the second parameter to the urlopen() call:

  1. f = urllib2.urlopen("http://example.com/", "key=value")

The first attempt involved simply adding the XML data as the second parameter, but that made the Solr Webapp return a “400 – Bad Request” error. The reason for Solr barfing is that the urlopen() function sets the Content-Type to application/x-www-form-urlencoded. We can solve this by changing the Content-Type header:

  1. solrReq = urllib2.Request(updateURL, '<commit waitFlush="false" waitSearcher="false"/>')
  2. solrReq.add_header("Content-Type", "text/xml")
  3. solrPoster = urllib2.urlopen(solrReq)
  4. response = solrPoster.read()
  5. solrPoster.close()

Other XML-based Solr requests, such as adding and removing documents from the index, will also work by changing the Content-Type header.

The same code will also allow you to use urllib to submit SOAP, XML-RPC-requests and use other protocols that require you to change the complete POST body of the request.