Wednesday, November 5, 2008

Amazon S3 file transfer daemon

This is a project that I have been working on that I have just uploaded to launchpad. It is not directly related to my work on the BLOB streaming engine for MySQL but it does contain code and ideas that I hope to work into it. The project name on launchpad is 's3daemon'.

The Amazon S3 file transfer daemon is a daemon process that runs in the background and monitors folders. When it finds a file in one of the folders that matches a specific pattern it transfers that file to S3 storage and then removes the local copy. The folders monitored and the patterns searched for are controlled by a configuration file.

Included with the daemon is an apache module that handles requests for files in the folders monitored by the daemon. If a request comes in for a file and the file cannot be found locally the caller is sent a redirect to get the file directly from the S3 server.  The redirect contains a signature created by the module using a public/private key combination that tells the S3 server that the caller is authorized to get the requested file. Included in the authorization is a time stamp that limits how long the URL/signature combination is good for.

I am thinking that a BLOB streaming engine that handled BLOBs in a similar manner may be of interest to people storing a large amount of image data. Such an engine would allow the creation and deletion of the data to be controlled by the database while the actual BLOBs are stored in S3 storage. This also decreases the bandwidth requirements of the server machine because the actual data will be served up by the S3 server.