Barry's PrimeBase Development: Ideas for BLOB streaming

Tuesday, October 14, 2008

Ideas for BLOB streaming

Hi,

How that I have the latest release of PBMS out the door I thought I would post some of the ideas I have had for possible ways in which the BLOB streaming engine could be expanded upon.

What if the BLOB streaming engine supported the idea of having it's own BLOB storage engines that could be plugged into it the same way that storage engines are plugged into MySQL. The API for these BLOB storage engines would be dead simple, all they would need to support would be a 'get', 'put', and 'delete' method. The BLOB streaming engine would handle the reference counting and still provide the simple HTTP server for direct access to the BLOBs but the BLOB engines would handle how and where the actual BLOB data would be stored.

Currently the blob data is stored locally in blob repository files, which is very efficient but may not be ideal for some applications. Here are a few ideas I have had for possible BLOB storage engines:

File Storage: This storage engine would just store the blobs in individual files. It may be of interest to applications where they want to be able to directly grab the data such a web application where the data is directly read by the web server.
Amazon S3 Storage: This storage engine would store the BLOBs in Amazon S3 buckets. This BLOB storage engine would be of interest to applications that deal with massive amounts of data and need a highly scalable storage solution.
Mirrored Storage: This storage engine would replicate the BLOBs to multiple geographical locations so that requests for the data could be directed to the closest mirrored site.
Load Balancing Storage: This storage engine would have the ability to move the BLOB data around to from one storage location to another to try to optimize it's access time or more evenly distribute the storage load on different servers.
Wally Storage: This engine wouldn't actually store anything and would reply to any request for data that it will have it for them by Wednesday. This would be of interest to applications that are forced to store data that they know nobody will ever actually want.

To allow for more flexibility in the BLOB storage engines the BLOB streaming engine would have to be able to handle a redirect reply from the BLOB storage engines. This redirect could be handled inside of the PBMS client API so that an application requesting a BLOB via the API would never need to know anything about the redirect that took place.

This concept would allow the manner and location in which BLOBs are stored to be completely decoupled from the database and database server.

Barry