June 9, 2010

django-inotifier development story

The other day I started a new Django project which required me to know when  files were received by dcmrcv -- a file transfer agent in the DICOM Toolkit (dcm4che.org).  One way to accomplish this would be to monitor and parse the output of dcmrcv, perhaps with a hairy regex.  I decided to avoid that punishment and instead settled on using inotify, or more specifically, pyinotify, to monitor the directory and notify me when a new file appeared.

Background from wikipedia:
inotify is a Linux kernel subsystem that acts to extend filesystems to notice changes to the filesystem, and report those changes to applications.

For this project I just needed to setup pyinotify to watch a single directory -- incoming/ -- and handle the signals that it generates when file events happen there.  There are a number of different events that pyinotify can send but I only needed two -- IN_CREATE and IN_MOVED_TO.  I started just watching for IN_CREATE, but dcmrcv did things a little differently and required a little more work.

When receiving a file, dcmrcv creates a new temp file ending with a .part extension and writes chunks to this file as it receives them from the network.  When it has received everything and written it, it moves the .part file to the final filename.  This move does not generate a IN_CREATE event from pyinotify even if the final file did not exist beforehand.  It does, however, generate an IN_MOVED_TO event so I combined the monitoring of both to track a file from .part creation to the move to the final filename.  This is the CreateViaChunksSignaler() class in inotifier/event_processors.py.

I now had my event processor class, the events I needed to watch for, and the path to watch.  Now I just needed to hook it all into my Django project in a convenient way.  The end goal of this was to create a model instance for each file received by dcmrcv, so I considered just sticking all of this pyinotify stuff into the app where that model lived.  I didn't really like that, as the functionality was necessary to accomplish the task but it wasn't specifically relevant to that app.  At this point, I realized I could spin-out the filesystem monitoring stuff into its own more generic application and connect to it via signals.  The perfect recipe for a pluggable app!

I wrote a couple fairly simple management commands to start and stop the pyinotify daemon, added a Django signal for each pyinotify event, and settled on a single setting needed to configure it for each project.

The end result is a pluggable app which allows you to monitor any number of paths on your filesystem.  A different set of events can be watched for on each path, and they can be handled in different ways by using different event processor classes.  For a quick start, you can use the included AllEventsSignaler() class and just connect to the signals which you need.  Enjoy!

No comments:

Post a Comment