A question that came up on #gearman on freenode today was how to make sure that a task is only performed by one worker at a time (remember from our previous introduction to Gearman that a worker is the actual piece of code performing a task that has been submitted to gearmand).
I had a few naive suggestions:
Run memcache with a low timeout (add a key when the task arrives with a low timeout value, if the add fails, simply return as someone else is probably doing the task).
Add a function for each unique identification value that can be performed, and only register one worker for each function (I like the memcache solution way better…, but it’d work. at least for a bit.)
But neither of these are a good solution to the problem; luckily Brian Moon also saw the question and was quick to point out that Gearman actually has a built-in mechanism for handling de-duplication of tasks. I’ve never used it myself (only read about it a couple of times), so it’s a good thing that Brian paid attention :-)
The solution: Use gearman_job_unique (in the PHP extension this value (named $unique in the documentation) can be tacked on to the end of most methods that add tasks or perform tasks directly (such as the do* methods)) – if Gearman sees a value that there’s already a worker active for, it’ll not resubmit the task but simply return the same result when the first worker returns (unless it’s a background task, where the second call will just return – there’s no difference in a task being submitted or already being run if you’re counting on Gearman to de-duplicate your tasks).
So if you need to lock and exit, remember that Gearman has de-duplication of non-unique tasks built-in. I tend to forget.
One thought on “Gearman and Locking for Identical Jobs / Tasks”
Wow, this just came up in my google search – I asked the question in irc yesterday :)
(today I’ve set up a fresh debian system, compiled php-cli pcntl, installed gearman, pecl/gearman and Brian’s GearmanManager and I’m about to get my hands really dirty!)