Wednesday, November 5, 2008

Amazon S3 file transfer daemon

This is a project that I have been working on that I have just uploaded to launchpad. It is not directly related to my work on the BLOB streaming engine for MySQL but it does contain code and ideas that I hope to work into it. The project name on launchpad is 's3daemon'.

The Amazon S3 file transfer daemon is a daemon process that runs in the background and monitors folders. When it finds a file in one of the folders that matches a specific pattern it transfers that file to S3 storage and then removes the local copy. The folders monitored and the patterns searched for are controlled by a configuration file.

Included with the daemon is an apache module that handles requests for files in the folders monitored by the daemon. If a request comes in for a file and the file cannot be found locally the caller is sent a redirect to get the file directly from the S3 server.  The redirect contains a signature created by the module using a public/private key combination that tells the S3 server that the caller is authorized to get the requested file. Included in the authorization is a time stamp that limits how long the URL/signature combination is good for.

I am thinking that a BLOB streaming engine that handled BLOBs in a similar manner may be of interest to people storing a large amount of image data. Such an engine would allow the creation and deletion of the data to be controlled by the database while the actual BLOBs are stored in S3 storage. This also decreases the bandwidth requirements of the server machine because the actual data will be served up by the S3 server.



Monday, October 20, 2008

Using config.status to build outside the tree

I just thought I would share some of my discoveries. You may already have known this but I didn't.

In recent releases both the PBMS and PBXT engines have been using a configuration flag "--with-mysql" to tell the build where to find the MySQL tree. We then looked inside the 'config.status' file generated when MySQL was configured to get the build options, most importantly the compiler options used. We had been doing this using 'grep' and 'sed'. The problem we soon discovered was that the format of the 'config.status' file changes quit frequently with different versions of 'autoconf'. 

With the help of the good people on the 'autoconf' forum I discovered that you can ask 'config.status' to give you the value of a single substitution. So to get the value for CFLAGS you enter:
echo '@CFLAGS@' | config.status --file-
and it will print out a line with the CFLAGS.

I understand that this feature which has always existed is now being added to the 'autoconf' documentation.

Using this when configuring  your build guarantees that your compiler options match that of MySQL and avoids those nasty bugs where structure alignments do not match because of different compiler settings.

I hope this helps someone out there.

Barry

Tuesday, October 14, 2008

Ideas for BLOB streaming

Hi,

How that I have the latest release of PBMS out the door I thought I would post some of the ideas I have had for possible ways in which the BLOB streaming engine could be expanded upon. 

What if the BLOB streaming engine supported the idea of having it's own BLOB storage engines that could be plugged into it the same way that storage engines are plugged into MySQL. The API for these BLOB storage engines would be dead simple, all they would need to support would be a 'get', 'put', and 'delete' method. The BLOB streaming engine would handle the reference counting and still provide the simple HTTP server for direct access to the BLOBs but the BLOB engines would handle how and where the actual BLOB data would be stored.

Currently the blob data is stored locally in blob repository files, which is very efficient but may not be ideal for some applications. Here are a few ideas I have had for possible BLOB storage engines:
  • File Storage: This storage engine would just store the blobs in individual files. It may be of interest to applications where they want to be able to directly grab the data such a web application where the data is directly read by the web server.
  • Amazon S3 Storage: This storage engine would store the BLOBs in Amazon S3 buckets. This BLOB storage engine would be of interest to applications that deal with massive amounts of data and need a highly scalable storage solution.
  • Mirrored Storage: This storage engine would replicate the BLOBs to multiple geographical locations so that requests for the data could be directed to the closest mirrored site.
  • Load Balancing Storage: This storage engine would have the ability to move the BLOB data around to from one storage location to another to try to optimize it's access time or more evenly distribute the storage load on different servers.
  • Wally Storage: This engine wouldn't actually store anything and would reply to any request for data that it will have it for them by Wednesday. This would be of interest to applications that are forced to store data that they know nobody will ever actually want.
To allow for more flexibility in the BLOB storage engines the BLOB streaming engine would have to be able to handle a redirect reply from the BLOB storage engines. This redirect could be handled inside of the PBMS client API so that an application requesting a BLOB via the API would never need to know anything about the redirect that took place.

This concept would allow the manner and location in which BLOBs are stored to be completely decoupled from the database and database server.


Barry

Alpha release v05.06 of the BLOB streaming engine


Alpha version 5.06 of the BLOB streaming engine for MySQL has been released. You can download the source code from www.blobstreaming.org/download. The documentation has also been updated.

What's new in 5.06:
  • The BLOB streaming engine can now be used with MyISAM tables as well as tables created by any other MySQL storage engine. 
  • The name of the PrimeBase BLOB streaming engine has been changed from MyBS to PBMS which stands for "PrimeBase Media Streaming".
This version introduces a couple of new term:
  • streaming-enabled table: This is any table created by a streaming-enabled engine such as PBXT, or has triggers defined on blob referencing columns that notify the PBMS engine when BLOB references are inserted, updated, and deleted.
  • BLOB reference column: This is a column in a stream-enabled table that contains PBMS BLOB references and notifies the PBMS engine when data is inserted, updated, or deleted from the column.
The big news though is that as of this version you no longer need to have PBXT installed in order to use the PBMS engine. The PBMS engine provides a set of UDFs and client API functions that enable it to be used with tables created by any storage engine. The following is an example of how to create a streaming enabled MyISAM table:

Create table x.foo(c1 integer, c2 longblob) ENGINE = MYISAM;

create trigger x.foo_insert_trig BEFORE INSERT on x.foo for each row BEGIN set NEW.c2 = pbms_insert_blob_trig("x", "foo", 2, NEW.c2); END

create trigger x.foo_update_trig BEFORE UPDATE on x.foo for each row BEGIN set NEW.c2 = pbms_update_blob_trig("x", "foo", 2, OLD.c2, NEW.c2); END

create trigger x.foo_delete_trig BEFORE UPDATE on x.foo for each row BEGIN declare dummy integer; set dummy = pbms_delete_blob_trig("x", "foo", 2, OLD.c2); END


As a result of the name change you will see that any use of the letters 'mybs' has been changed to 'pbms' throughout the code and documentation.

Of lesser importance but also of note: I have become the official contact person for the BLOB streaming engine. Paul is still very much involved with it but is currently concentrating his efforts on the PBXT engine.

Barry

Friday, October 10, 2008

PrimeBase PBMS does MYISAM

Hi,

I just pushed a version of PBMS up onto launchPad that works with any storage engine. 

The new version provides a set of UDFs and API functions that allow you to set triggers on longblob columns so that the table can be used with the PBMS engine the same way a PBXT table would.  

The following has been added to the API:


/* Functions used for non PBMS enabled tables. */

/*

 * pbms_init_udfs() 

 * Declares the PBMS UDFs on the server if they do not already exist.

 *

 * Equivalent UFD: NONE

 */

pbms_bool pbms_init_udfs(MYSQL *mysql);


/*

 * pbms_enabled() 

 * Returns 'true' if the 'engine' is PBMS enabled i.e. provides direct support for PBMS. 

 * So far the PBXT engine is the only PBMS enabled engine out there. Hopefully that will change.

 *

 * Equivalent UFD: pbms_enabled_engine(engine)

 */

pbms_bool pbms_enabled(MYSQL *mysql, const char *engine, pbms_bool *enabled);


/*

 * pbms_init_table() 

 * Creates triggers on any 'longblob' columns in the table that will notify the PBMS engine when blob PBMS  

 * references are inserted, deleted, or updated. If 'no_blobs_ok' is 'false' then an error is reported if 

 * no 'longblob' columns are found. If 'no_blobs_ok' is 'true' and then no error is reported if no 'longblob'

 * columns are found and no action is taken.

 *

 * Equivalent UFD: NONE

 */

pbms_bool pbms_init_table(struct st_mysql *mysql, const char *database, const char *table, pbms_bool no_blobs_ok);


/*

 * pbms_reset_table_blobs() 

 * This is similar to  pbms_init_table() except that it is meant to be called if an alter table has been done on

 * a table that contains blobs being managed by PBMS and either a 'longblob' column has been added or dropped, or

 * the ordinal position of a 'longblob' column has changed.

 *

 * Equivalent UFD: NONE

 */

pbms_bool pbms_reset_table_blobs(struct st_mysql *mysql, const char *database, const char *table);


/*

 * pbms_drop_table_blobs() 

 * This function is called after a table has been dropped to notify PBMS to remove all blob references from that

 * table.

 *

 * Equivalent UFD: pbms_delete_all_blobs_in_table(database);

 */

pbms_bool pbms_drop_table_blobs(struct st_mysql *mysql, const char *database, const char *table);


/*

 * pbms_table_renamed() 

 * This function is called after a table containing blobs has been renamed.

 *

 * Equivalent UFD: pbms_rename_table_with_blobs(database);

 */

pbms_bool pbms_table_renamed(struct st_mysql *mysql, const char *database, const char *old_table, const char *new_table);


/*

 * pbms_dropping_database() 

 * Call this function before dropping any database that contained tables containing blobs. This gives the PBMS

 * engine a chance to remove it's files and sub directories from the database directory so that it can be

 * deleted.

 *

 * Equivalent UFD: pbms_dropping_database(database);

 */

pbms_bool pbms_dropping_database(struct st_mysql *mysql, const char *database);


/*

User defined functions provided with the PBMS engine:


//UDFs used in triggers: 

// 'col_position' is the ordinal of the longblob column starting at position 1.

pbms_insert_blob_trig(database, table col_position, blob_url);

pbms_update_blob_trig(database, table col_position, old_blob_url, new_blob_url);

pbms_delete_blob_trig(database, table col_position, blob_url);


// Example:

crete table x.foo(c1 int, c2 longblob);

create trigger x.foo_insert_trig  BEFORE INSERT on x.foo for each row BEGIN set NEW.c2 = pbms_insert_blob_trig("x", "foo", 2, NEW.c2); END

create trigger x.foo_update_trig  BEFORE UPDATE on x.foo for each row BEGIN set NEW.c2 = pbms_update_blob_trig("x", "foo", 2, OLD.c2, NEW.c2); END

create trigger x.foo_delete_trig  BEFORE UPDATE on x.foo for each row BEGIN declare dummy integer; set dummy = pbms_delete_blob_trig("x", "foo", 2, OLD.c2); END

///////////


pbms_delete_all_blobs_in_table(database, table);

pbms_rename_table_with_blobs(database, old_table, new_table);

pbms_dropping_database(database);

pbms_enabled_engine(engine);

NOTE: pbms_enabled_engine() returns -1 on error.


CREATE FUNCTION pbms_insert_blob_trig RETURNS STRING SONAME "libpbms.so";

CREATE FUNCTION pbms_update_blob_trig RETURNS STRING SONAME "libpbms.so";

CREATE FUNCTION pbms_delete_blob_trig RETURNS INTEGER SONAME "libpbms.so";

CREATE FUNCTION pbms_delete_all_blobs_in_table RETURNS INTEGER SONAME "libpbms.so";

CREATE FUNCTION pbms_rename_table_with_blobs RETURNS INTEGER SONAME "libpbms.so";

CREATE FUNCTION pbms_dropping_database RETURNS INTEGER SONAME "libpbms.so";

CREATE FUNCTION pbms_enabled_engine RETURNS INTEGER SONAME "libpbms.so";


*/


Use of  PBMS with non PBMS enabled engines is not yet transactionally safe. This means that if you use it on an INODB table while say doing some inserts in a transaction and then rollback the transaction the blob references will remain. Of course PBMS enabled engines such as PBXT do not have this problem. Making PBMS transactionally safe with non PBMS enabled engines is on my to-do list.

The test shell 'pbmstest' included with the PBMS source uses the new API functions and by default performs it's tests using a MYISAM table. 

I am currently working on getting the documentation updated with the new features and name change so that I can make an official release with this version.

Until then I invite anyone interested to grab a copy of the latest code from launchpad and play with it.

Barry

Thursday, September 18, 2008

BLOB streaming has changed its name and home

Hi all,

So you may have been thinking that maybe calling something "MyBS" was not such a great marketing idea. Even as a developer I have to admit that it isn't the best name abbreviation to have. So we decided to change it. We considered calling it the "BLOB Streaming Engine" but that resulted in the abbreviation "BSE" which also isn't so great. What we have finally settled on is the "PrimeBase Media Stream" engine or "PBMS" which I think is fairly safe.

We have also relocated the source for PBMS from sourceforge  onto LaunchPad where you can find it at: https://code.launchpad.net/pbms

Wednesday, September 10, 2008

Alpha release v05.05 of the BLOB Streaming Engine


Alpha version 5.05 of the BLOB streaming engine for MySQL has been released. You can download the source code from www.blobstreaming.org/download. The documentation has also been updated .

What's new in 5.05:

  • A 'C' API has been added for client applications. It provides all the basic functions needed to connect to the BLOB streaming engine and upload and download BLOBs efficiently.
  • A test client application has been added to the project to demonstrate the use of the new API.
  • Added discover table support for the engine's system tables.
  • Simplified the configuration: To configure the engine all you have to do is provide the path to the MySQL source tree (after building MySQL). All build options are taken from the MySQL build.
  • And of course there are assorted bug fixes, details of which are listed in the Changelog.

Of interest to other engine developers may be the way that I implemented 'discover tables'. I have created a very generic function that takes a structure similar to that used by schema plug-ins and generates the required 'frm' file. Implementation is contained in the file BSDiscover.cc and you can see how it is used in ha_mybs.cc.