Thursday, November 5, 2009

PBMS Cloud storage is back!

Hi,

Support for S3 BLOB storage has now been fully integrated into the PBMS engine. It works in much the same way that I mentioned in an earlier post but with some important changes so I will explain it all again here.

When using S3 BLOB storage with PBMS the BLOB reference tracking and metadata is handled the same as before in that they are stored in the BLOB record in the repository, but the actual BLOB is stored on an S3 server.

To setup S3 storage you need to add an S3 cloud reference record to the pbms.pbms_cloud table provided by PBMS. For example:

INSERT INTO pbms.pbms_cloud(ID, Server, bucket, PublicKey, PrivateKey) VALUES(16, "S3.amazonaws.com", "PBMS-Test", "abc123", "amjr15vWq");

Then you need to tell PBMS which database should use S3 cloud storage for its BLOBs. This is done by updating a couple of records in the pbms_variable table that PBMS provides for each user database. For example to setup the database "myDB" for S3 cloud storage you would do the following:

UPDATE myDB.pbms_variable set value = "16" where name = "S3-Cloud-Ref";
UPDATE myDB.pbms_variable set value = "CLOUD" where name = "Storage-type";

The database "myDB" is now setup for cloud storage. All BLOB data will now be stored in the bucket "PBMS-Test" on the S3 server "S3.amazonaws.com".

This diagram shows the steps taken when the PBMS client library uploads a BLOB to the PBMS repository using S3 cloud storage. All of these steps are performed by one call the the PBMS client lib and the client application knows nothing about the type of BLOB storage being used:




  • Step 1: The BLOB metadata is sent to the PBMS engine.
  • Step 2: A repository record is created containing the BLOB metadata.
  • Step 3: A reply is sent back to the client containing the BLOB reference which is passed back up to the client application to be inserted into the user's table in place of the BLOB. An S3 authorization signature is also returned to the client. The authorization signature is generated by the PBMS engine using the Public/Private keys for the S3 server to sign the BLOB upload request.
  • Step 4: The PBMS client library uses the authorization signature to upload the BLOB to the S3 server.
Alternatively the BLOB can be inserted directly into the user's table in which case the PBMS engine will upload it to the S3 server. This is not the preferred way of doing things because it forces the MySQL server to handle the BLOB data, which is what we are trying to avoid by using the PBMS engine.


Here is how the BLOB is accessed. Keep in mind that all the client application does is provide the PBMS lib with a PBMS BLOB reference as it currently does and receives the BLOB in return. It knows nothing about the cloud.




  • Step 1: A BLOB request containing a PBMS BLOB reference is sent to the PBMS engine’s HTTP server.
  • Step 2: The BLOB’s repository record is read from local storage.
  • Step 3: An HTTP redirect reply is sent back to the client redirecting the request to the BLOB stored in the cloud. The metadata associated with the BLOB is returned to the client in the reply’s HTTP headers. The redirect URL is an authenticated query string that gives the client time limited access to the BLOB data. Use of an authenticated query string allows the data in the cloud to have access protection without requiring the client applications to know the private key normally required to get access.
  • Step 4: The redirect is followed to the cloud and the BLOB data is retrieved.
The beauty of this system is that the client applications need never know how or where the actually BLOB data is stored and since the BLOB transfer is all directly between the client and the S3 server, the BLOB data doesn't need to be handled at all by the machine on which the MySQL server is running.

Because the BLOB repository records indirectly refer back to the S3 server via the pbms_cloud table, the administrator is free to move the BLOB data between S3 servers and/or buckets with out having to do anything more than updating the record in the pbms_cloud table. For example given the following setup:




all that would be required to relocate the BLOB data from the Amazon S3 server to the Sun S3 server would be to:

  • Copy the BLOB data from the amazon server to the Sun server.
  • Update the pbms_cloud table as: UPDATE pbms.pbms_cloud set server = "S3.sunaws.com", Bucket = "B3" PublicKey = "zft123", PrivateKey = "abc123" where id = 17;
  • Delete the BLOBs on the amazon server.
resulting in:



I will explain how the new BLOB repository backup system works with cloud storage in another blog entry.

Barry

No comments: