SQLAlchemy, MySQL and UTF-8

While SQLAlchemy uses UTF-8 by default, the charset used when communicating with MySQL will affect the encoding of the returned data. To be sure that everything is handled properly as UTF-8 (which you might use SET NAMES 'utf8' in the console (don’t do that here..)), add ?charset=utf8 to your connection url:

mysql://user:password@localhost/database?charset=utf8

Thanks to RustyFluff at StackOverflow.

Debugging Python’s Memory Usage with Dowser

As I mentioned in my previous post, I had to hunt down a leak (which was intentional considering the functionality) somewhere in a batch import task in my Pyramid app. I’ve never played around with any memory profilers in python before, so this was a proper opportunity to see what the different options were. StackOverflow to the rescue as usual, with a handful of suggestions for Python memory profilers.

After trying a few, I ended up with Dowser. Dowser fit my use case neatly, as my application was a long running process, was console based (since it uses cherrypy to launch its own HTTP Server, it was a good thing that it didn’t conflict with any existing serv er) and I could pause it at a proper location before it consumed too much memory (a time.sleep(largevaluehere) worked nicely, thank you).

Installing Dowser was relatively pain free (a few of the other options I tried either needed custom patches, or required the process to run all the way through before giving me the information I needed).

I needed to get a few dependencies installed:

pip install pil

.. which Dowser uses to generate sparkline diagrams, and cherrypy itself:

easy_install cherrypy

.. and last, checking out the latest version of Dowser from SVN:

svn co http://svn.aminus.net/misc/dowser dowser

I modified the example from the Stack Overflow question above a bit, and ended up with a small helper function in the application’s helper library:

def launch_memory_usage_server(port = 8080):
    import cherrypy
    import dowser

    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    
    cherrypy.engine.start()

Then doing launch_memory_usage_server() somewhere early in my code launched the HTTP interface (http://localhost:8080/) to see memory usage while the import process was running. This helped me narrow down where the issue showed up (as we were leaking MySQLdb cursors at an alarming rate), and digging deeper into the structure hinted to the underlying cause (the debug toolbar was active for a console script).

Leaking Memory / Cursors with SQLAlchemy and Pyramid

After spending the better part of the day trying to find out why the fsck my console script for importing a dataset through sqlalchemy needed just above 7GBs of memory before barfing out and swapping like a madman, I finally found the solution.

Make sure that Pyramid’s debug toolbar is disabled. It’ll keep an reference around to all queries ran through SQLAlchemy (for .. well, debugging purposes, obviously). This causes an issue if you’re running a very large number of queries, and you’re not going to use the debug toolbar from the console anyway, so .. get rid of it.

I created a second version of my development.ini, a development_console.ini that doesn’t load the debug toolbar, and finally stuff Just Worked ™ again.

Console / shell script runs twice when using pyramid.paster and bootstrapping

You’ve just created your new, exceptional shell script to maintain your pyramid application from the console, when you discover that everything runs twice. Usually not a very good idea, but .. why?

The issue is probably that you’re running pyramid with the scan option, which requires all modules in the path of your pyramid application to be imported. This will also import your console script, and if you haven’t placed everything into a function and added a check to see if the script has been invoked directly, you’re fscked!

The easy way out is to put your code into a function:

def main():
    from pyramid.paster import bootstrap
    env = bootstrap('../../development.ini')

[snip]

if __name__ == '__main__':  
    main()

The last if-test check if this is the main file that has been invoked, and if true, calls main and launches your script. This should hopefully solve the issue!