Friday, November 6, 2009

PBMS backup and S3 storage

Hi,

So I had to make some changes to the way that backup worked with PBMS in order that it would behave as expected when the BLOB data is stored remotely on an S3 server.

I will go over how the PBMS repository is backed up first and then explain how it works with S3 storage.

PBMS provides a pbms_backup table in its 'pbms' database that records all the backups that have been performed on any of the PBMS repositories on the server. The create statement for this table looks like this:

CREATE TABLE pbms.pbms_backup (
id INT NOT NULL AUTO_INCREMENT,
Database_Name VARCHAR(64) NOT NULL,
Started TIMESTAMP,
Completed TIMESTAMP,
IsRunning BOOL,
IsDump BOOL,
Location VARCHAR(1024),
Cloud_Ref INT,
Cloud_backup_no INT,
PRIMARY KEY (id)
)

There are 2 ways to backup a PBMS repository:

  • MySQLDump: To backup the repository using 'mysqldump' you simply dump the database's 'pbms_dump' table. This table is a one column table of type 'long blob'. Any select from this table is assumed to be a repository backup and any insert into the table is assumed to be a repository recovery. Selecting the data from the 'pbms_dump' table results in a new record being added to the 'pbms.pbms_backup' table. To recover the repository the dumped data is inserted back into the 'pbms_dump' table. During the recovery it is important to set the pbms variable "Restoring-Dump" in the database's 'pbms_variable' table to "TRUE". This tells the PBMS engine that the database is being recovered and that insertion of BLOB references should not increment the BLOB reference count.
  • Engine level backup: To perform an engine level backup all you do is insert a record into the 'pbms.pbms_backup' table providing the name of the database to be backed up and the location into which the backup should be placed. This starts an asynchronous backup operation. To recover the repository from the backup all you do is drag and drop the backup into your recovered database.
So how does this work when the actual BLOB data is stored on an S3 server?

What happens is the backup process makes a copy of the BLOB data on the S3 server. In the case of an engine level backup the user can specify the S3 Server that the BLOBs should be backed up to which may be a different server entirely. BLOB data is stored on the S3 server using generated names matching the pattern:

<database_id>/<backup_no>/<cloud_ref>.<time_stamp>.<sequence_count>
Example: 1257393091/0/87.1257393139.626

  • database_id is the id of the database as provided by the PBMS engine.
  • backup_no is the backup number of the BLOB. For BLOBs that are not backup copies this number is always zero. The backup number is just a sequence counter that ensures that the backup BLOBs have a unique name. All backup BLOBs from the same backup will have the same backup number. This backup number is stored in the 'Cloud_backup_no' column of the 'pbms.pbms_backup' table.
  • cloud_ref is the reference into the 'pbms.pbms_cloud' table that refers to the S3 server and bucket in which the BLOB is stored.
  • time_stamp is the time in seconds at which the BLOB was created.
  • sequence_count is just a counter to ensure that no 2 BLOBs get the same name.
When a backup is started a check is done to find an unused backup number. Then as each BLOB repository record is backed up the S3 server is sent a request to copy the BLOB to its new name with the new backup number which may also be in a different bucket or on a different server.

The first step of the database recovery is to delete any BLOBs from the original database. This is done by performing the S3 equivalent of 'rm -r <database_id>/0/* ' .

Recovery from an sqldump backup is just the reverse of the backup, as each BLOB repository record is written back to the repository the BLOB is copied back to its original name and location with the backup number set to zero. The 'cloud_ref' number is used to look up the S3 server location from which the original BLOB came.

Recovery from an engine level backup is a bit different because the repository recovery is just a drag and drop operation. The first time the database is accessed the PBMS engines sees that the repository has just been recovered and starts the S3 BLOB recovery process. To recover the S3 BLOBs a list of all the BLOBs in the backup is made and then using the 'cloud_ref' number from the BLOB name the original location of the BLOB is found and the backed up BLOB is copied back to it.

The nice thing about using S3 storage for the BLOBs is that the database BLOB repository can be backed up quite nicely just using mysqldump.

When deleting old backups it is important to remember that there may be BLOBs on an S3 server that also need to be cleaned up. This is where you can make use of the 'pbms.pbms_backup' table to find the location of the backed up BLOBs and using a tool, such as the "S3 FireFox Organizer", delete them. After that the backup record can be deleted from the 'pbms.pbms_backup' table. I could have had the PBMS engine delete all the backed up BLOBs for a backup when the record was deleted from the 'pbms.pbms_backup' table but I thought that that could lead to lost data if the user did not realize the side effects of deleting a backup record.

Barry

Thursday, November 5, 2009

PBMS Cloud storage is back!

Hi,

Support for S3 BLOB storage has now been fully integrated into the PBMS engine. It works in much the same way that I mentioned in an earlier post but with some important changes so I will explain it all again here.

When using S3 BLOB storage with PBMS the BLOB reference tracking and metadata is handled the same as before in that they are stored in the BLOB record in the repository, but the actual BLOB is stored on an S3 server.

To setup S3 storage you need to add an S3 cloud reference record to the pbms.pbms_cloud table provided by PBMS. For example:

INSERT INTO pbms.pbms_cloud(ID, Server, bucket, PublicKey, PrivateKey) VALUES(16, "S3.amazonaws.com", "PBMS-Test", "abc123", "amjr15vWq");

Then you need to tell PBMS which database should use S3 cloud storage for its BLOBs. This is done by updating a couple of records in the pbms_variable table that PBMS provides for each user database. For example to setup the database "myDB" for S3 cloud storage you would do the following:

UPDATE myDB.pbms_variable set value = "16" where name = "S3-Cloud-Ref";
UPDATE myDB.pbms_variable set value = "CLOUD" where name = "Storage-type";

The database "myDB" is now setup for cloud storage. All BLOB data will now be stored in the bucket "PBMS-Test" on the S3 server "S3.amazonaws.com".

This diagram shows the steps taken when the PBMS client library uploads a BLOB to the PBMS repository using S3 cloud storage. All of these steps are performed by one call the the PBMS client lib and the client application knows nothing about the type of BLOB storage being used:




  • Step 1: The BLOB metadata is sent to the PBMS engine.
  • Step 2: A repository record is created containing the BLOB metadata.
  • Step 3: A reply is sent back to the client containing the BLOB reference which is passed back up to the client application to be inserted into the user's table in place of the BLOB. An S3 authorization signature is also returned to the client. The authorization signature is generated by the PBMS engine using the Public/Private keys for the S3 server to sign the BLOB upload request.
  • Step 4: The PBMS client library uses the authorization signature to upload the BLOB to the S3 server.
Alternatively the BLOB can be inserted directly into the user's table in which case the PBMS engine will upload it to the S3 server. This is not the preferred way of doing things because it forces the MySQL server to handle the BLOB data, which is what we are trying to avoid by using the PBMS engine.


Here is how the BLOB is accessed. Keep in mind that all the client application does is provide the PBMS lib with a PBMS BLOB reference as it currently does and receives the BLOB in return. It knows nothing about the cloud.




  • Step 1: A BLOB request containing a PBMS BLOB reference is sent to the PBMS engine’s HTTP server.
  • Step 2: The BLOB’s repository record is read from local storage.
  • Step 3: An HTTP redirect reply is sent back to the client redirecting the request to the BLOB stored in the cloud. The metadata associated with the BLOB is returned to the client in the reply’s HTTP headers. The redirect URL is an authenticated query string that gives the client time limited access to the BLOB data. Use of an authenticated query string allows the data in the cloud to have access protection without requiring the client applications to know the private key normally required to get access.
  • Step 4: The redirect is followed to the cloud and the BLOB data is retrieved.
The beauty of this system is that the client applications need never know how or where the actually BLOB data is stored and since the BLOB transfer is all directly between the client and the S3 server, the BLOB data doesn't need to be handled at all by the machine on which the MySQL server is running.

Because the BLOB repository records indirectly refer back to the S3 server via the pbms_cloud table, the administrator is free to move the BLOB data between S3 servers and/or buckets with out having to do anything more than updating the record in the pbms_cloud table. For example given the following setup:




all that would be required to relocate the BLOB data from the Amazon S3 server to the Sun S3 server would be to:

  • Copy the BLOB data from the amazon server to the Sun server.
  • Update the pbms_cloud table as: UPDATE pbms.pbms_cloud set server = "S3.sunaws.com", Bucket = "B3" PublicKey = "zft123", PrivateKey = "abc123" where id = 17;
  • Delete the BLOBs on the amazon server.
resulting in:



I will explain how the new BLOB repository backup system works with cloud storage in another blog entry.

Barry

Tuesday, September 29, 2009

PBMS will be at the OpenSQL camp in Portland Nov. 14-15

I am planning on presenting a session on PBMS at the OpenSQL camp. I am in hopes of have a chance to discuss PBMS with people and find out how they are planning on using it and what features they would like to see in it.

But even if you are not interested in PBMS you should still come if for no other reason than the free pizza!

I am proud to say the PrimeBase Technologies is one of the organizations who's sponsorship money is helping to provide the free pizza.

I will see you all there,

Barry

Saturday, August 15, 2009

PBMS version 0.5.09 has been released.

A new release of the PrimeBase Media Streaming engine is now available for download at
http://www.blobstreaming.org .

The main focus of this release was to provide the features required to make PBMS production ready and with the addition of transaction support and engine level backup I think it is almost there. The engine does not yet provide engine level replication but it will in a future release.

What's new:
  • PBMS now provides two methods to backup the BLOB repository which are documented here.
  • PBMS now supports transactions independently of the storage engine.
  • There is now a PHP extension for PBMS. Check it out!
  • The interface between PBMS and storage engines has been greatly simplified so that PBMS support can be added directly to MySQL or other engines such as InnoDB by applying a simple patch to the handler as talked about in an earlier blog here.
What's Next:

The next step is to take this version and do extensive testing to make sure it is stable. There are a few known bugs listed in the To-Do list that I need to fix and as I add more test cases and do more stress testing I am sure I will find more.

The S3 Cloud support, which was mentioned here, has been pulled out of this release and I want to get it back in as soon as possible.

I also plan to make some patches available that can be applied to MySQL or InnoDB to make them PBMS enabled.

And of course I need to get PBMS into Drizzle. That isn't really that much work, I have already had it running with Drizzle but I have been waiting to get a stable full feature version first. I think what I have now is close enough so I will be working on this shortly.

What's after that:

Once I get the current version running stably with no major bugs or limitations and have taken care of the above mentioned issues, I plan on working on the following .

  • Remote repositories: the entire repository is stored somewhere other than in the MySQL data directory, most likely on a different machine entirely. This is different from cloud storage in that the entire repository is stored remotely instead of just the BLOB data. The remote repository server would be a standalone PBMS HTTP server providing clients with access to the BLOB data stored in the repository. The standalone PBMS HTTP server could in turn be using cloud storage so that the BLOB data itself was stored in a cloud. A solution like this should be able to work with the NDBCLUSTER storage engine.
  • Engine level replication: This would be done similar to the remote repository but the data would be stored in the local repository as well as being forwarded to the slave servers.
  • Incremental backup. This allows you to basically sync the current state of the BLOB repository with that of a backup. To do this all new BLOBs and BLOB references in the current database would be added to the backup and BLOBs and BLOB references in the backup that no longer exist in the current database would be removed. In this way only the new data is copied to the backup location.


As usual if you have any questions about PBMS and BLOB storage please send them to me and I will do my best to answer them.

Barry

Wednesday, July 22, 2009

New simplified engine interface for PBMS

By making PBMS transactional I have been able to greatly simplify the engine interface making it much easier for engine builders to build in support for PBMS. How much simpler is it? From the time I decided to make InnoDB PBMS enabled to when I started the rebuild of MySQL was less than half an hour!

The same way that I added PBMS support to InnoDB it can be added directly to the MySQL server so that the PBMS engine will be used for BLOB storage for all engines regardless of if they have been enabled or not. PBMS support for drizzle will be provided via a data filter plug-in which I have yet to write but will soon.

To add PBMS support all you need to do is add the file pbms_enabled.cc to your source code and add the following to your handler code. I will use the InnoDB handler code as an example:


File ha_innodb.cc:


#ifdef USE_PRAGMA_IMPLEMENTATION
#pragma implementation // gcc: Class implementation
#endif
:
:
:
/* Include necessary InnoDB headers */
extern "C" {
#include "../storage/innobase/include/univ.i"
:
:
:
#include "../storage/innobase/include/ha_prototypes.h"
}

#define PBMS_ENABLED

#ifdef PBMS_ENABLED
#include "pbms_enabled.h"
#endif

static const long AUTOINC_OLD_STYLE_LOCKING = 0;
static const long AUTOINC_NEW_STYLE_LOCKING = 1;
:
:
:
innobase_init(
/*==========*/
/* out: 0 on success, error code on failure */
void *p) /* in: InnoDB handlerton */
{
:
:
:
err = innobase_start_or_create_for_mysql();

if (err != DB_SUCCESS) {
my_free(internal_innobase_data_file_path,
MYF(MY_ALLOW_ZERO_PTR));
goto error;
}

#ifdef PBMS_ENABLED
PBMSResultRec result;
if (!pbms_initialize("InnoDB", &result)) {
sql_print_error("pbms_initialize() Error: %s", result.mr_message);
goto error;
}
#endif
(void) hash_init(&innobase_open_tables,system_charset_info, 32, 0, 0,
(hash_get_key) innobase_get_key, 0, 0);
:
:
:
error:
DBUG_RETURN(TRUE);
}
:
:
:
innobase_end(handlerton *hton, ha_panic_function type)
/*==============*/
/* out: TRUE if error */
{
int err= 0;

DBUG_ENTER("innobase_end");

#ifdef __NETWARE /* some special cleanup for NetWare */
if (nw_panic) {
set_panic_flag_for_netware();
}
#endif
if (innodb_inited) {

:
:
:

#ifdef PBMS_ENABLED
pbms_finalize();
#endif
}

DBUG_RETURN(err);
}

:
:
:
int
ha_innobase::write_row(
/*===================*/
/* out: error code */
uchar* record) /* in: a row in MySQL format */
{
int error = 0;
ibool auto_inc_used= FALSE;
ulint sql_command;
trx_t* trx = thd_to_trx(user_thd);

DBUG_ENTER("ha_innobase::write_row");

#ifdef PBMS_ENABLED
PBMSResultRec result;
error = pbms_write_row_blobs(table, record, &result);
if (error) {
sql_print_error( "pbms_write_row_blobs() Error: %s", result.mr_message);
DBUG_RETURN(error);
}
#endif

:
:
:
#ifdef PBMS_ENABLED
pbms_completed(table, (error == 0));
#endif
DBUG_RETURN(error);
}
:
:
:
ha_innobase::update_row(
/*====================*/
/* out: error number or 0 */
const uchar* old_row, /* in: old row in MySQL format */
uchar* new_row) /* in: new row in MySQL format */
{
upd_t* uvect;
int error = 0;
trx_t* trx = thd_to_trx(user_thd);

DBUG_ENTER("ha_innobase::update_row");

#ifdef PBMS_ENABLED
PBMSResultRec result;

error = pbms_delete_row_blobs(table, old_row, &result);
if (error) {
sql_print_error( "update_row:pbms_delete_row_blobs() Error: %s",
result.mr_message);
DBUG_RETURN(error);
}
error = pbms_write_row_blobs(table, new_row, &result);
if (error) {
sql_print_error( "update_row:pbms_write_row_blobs() Error: %s",
result.mr_message);
goto pbms_done;
}
#endif

:
:
:
#ifdef PBMS_ENABLED
pbms_done:
pbms_completed(table, (error == 0));
#endif
DBUG_RETURN(error);
}
:
:
:
int
ha_innobase::delete_row(
/*====================*/
/* out: error number or 0 */
const uchar* record) /* in: a row in MySQL format */
{
int error = 0;
trx_t* trx = thd_to_trx(user_thd);

DBUG_ENTER("ha_innobase::delete_row");

#ifdef PBMS_ENABLED
PBMSResultRec result;

error = pbms_delete_row_blobs(table, record, &result);
if (error) {
sql_print_error( "pbms_delete_row_blobs() Error: %s", result.mr_message);
DBUG_RETURN(error);
}
#endif

:
:
:
#ifdef PBMS_ENABLED
pbms_completed(table, (error == 0));
#endif
DBUG_RETURN(error);
}
:
:
:
int
ha_innobase::delete_table(
/*======================*/
/* out: error number */
const char* name) /* in: table name */
{
:
:
:
#ifdef PBMS_ENABLED
/* Call pbms_delete_table_with_blobs() last because it cannot be undone. */
if (!error) {
PBMSResultRec result;

if (pbms_delete_table_with_blobs(name, &result)) {
sql_print_error( "pbms_delete_table_with_blobs() Error: %s",
result.mr_message);
}

pbms_completed(NULL, true);
}
#endif
DBUG_RETURN(error);
}

:
:
:
int
ha_innobase::rename_table(
/*======================*/
/* out: 0 or error code */
const char* from, /* in: old name of the table */
const char* to) /* in: new name of the table */
{
ulint name_len1;
ulint name_len2;
int error;
trx_t* parent_trx;
trx_t* trx;
char norm_from[1000];
char norm_to[1000];
THD* thd = ha_thd();

DBUG_ENTER("ha_innobase::rename_table");

#ifdef PBMS_ENABLED
PBMSResultRec result;

error = pbms_rename_table_with_blobs(from, to, &result);
if (error) {
sql_print_error( "pbms_rename_table_with_blobs() Error: %s", result.mr_message);
DBUG_RETURN(error);
}
#endif
:
:
:
#ifdef PBMS_ENABLED
pbms_completed(NULL, (error == 0));
#endif
DBUG_RETURN(error);
}

Now you are probably wondering how I spent half an hour just adding that.

Tuesday, July 21, 2009

PBMS is transactional!

The PBMS engine now has built in support for transaction so that if you reference or dereference BLOBs during a transaction the changes you made will be committed or rolled back with the transaction. Up until now PBMS had depended on the host engine to handle transactions and inform it what needed to be done in the event of a rollback.

I have implemented transaction support by adding a circular transaction log and transaction cache. The circular transaction log is dynamic and can grow and shrink as required. The transaction records are all small (30 byte) fixed length records so the log doesn’t need to be that large to be able to handle a lot of transactions. A head and tail pointer are maintained to indicate where the head and tail of the circular log is. If the circular log becomes full then the records just overflow and are written to the end of the log. Once the transaction reader has caught up after an overflow has occurred it simply resizes the circular log to include the overflow and everything continues as normal with a larger circular log. The circular log will be reduced to its normal size once free space becomes available at the end of the log.

If anyone has any use for something like this them selves it has been written as a standalone module so it would not be to difficult to pull it out and build it into something else.

Wednesday, May 27, 2009

Backing up BLOBs

The PBMS BLOB repositories can now be backed up with mysqldump.

A new system table ‘pbms_dump’ contains 1 row for each repository record. Each row consists of 1 column which is a BLOB. The content of this table has no purpose other than for the use in repository backup. By inserting the data back into this table the repository is recovered. Before doing the backup the pbms_dump system table needs to be discovered, to do this execute the command “select * from pbms_dump where false”. After doing this then mysqldump will include it in backups. Be sure to use the “—hex-blob” option. For example:

Mysqldump –u root –hex-blob mydatabase > mydatabase.sql

Before recovering the database you need to notify PBMS that a recovery operation is in progress so that the recovery of the tables containing the BLOB references do not increment the BLOB reference count. This notification is done by setting a bool flag in a new system table (I like system tables ☺) pbms_variable. To restore a database perform the following operations:

----------------------------
create database recovtest;
use recovtest;

update pbms_variable set Value = "true" where Name = "Restoring-Dump";
source mydatabase.sql;
update pbms_variable set Value = "false" where Name = "Restoring-Dump";
-------------------------------

I am also planning on implementing the engine level backup and recovery supported in MySQL 6.o as well as an internal backup method that will produce a backup that you can recover just by dragging and dropping it into the ‘pbms’ folder where your database folders are located.

The internal backup will run asynchronously in the background and will not block normal operations of the database including adding and deleting BLOBs. The way that it would work is that the administrator would lock the tables containing the BLOB references and back them up. Sometime during the backup, before releasing the locks, the asynchronously PBMS backup of the database would be started. Progress of the backup could be monitored via a pbms_status table.

The version supporting mysqldump is in the PBMS branch "~pbms-core/pbms/cloud" on launchpad. Be warned though that this is a development branch with everything that implies.