Thursday, May 26, 2011

What is in the PBMS patch for MySQL 5.5

I thought people may be interested to know what the PBMS patch for MySQL actually patches, in case they should think this is a major hack into the MySQL source code.

Almost all of  the patch consists of  the PBMS daemon source code which is added to the "storage/pbms" folder in the MySQL source code tree. Other than that here is a list of the actual MySQL files touched and what the patch is for:

  • sql/CMakeLists.txt:
    Added PBMS source directories to the header file search list.
    Lines added: 1.
  • sql/handler.cc:
    Added PBMS server side API calls to check for longblob columns being modified or tables containing longblob columns being dropped or renamed. This is the guts of the PBMS patch.
    Lines added: 170.
  • libmysql/CMakeLists.txt:
    Added PBMS API functions to the client API functions list and the PBMS source directories to the header file search list.  Also adds the PBMS lib source code to the MySQL client lib build.
    Lines added: 61.
  • include/mysql.h:
    Added a line to include the PBMS  LIB header file "pbmslib.h".
    Lines added: 3.
  • include/mysql.h.pp:
    Added lines to reflect the changes made to the MySQL client API when adding the PBMS API to it:
    Lines added: 42.
  • include/pbmslib.h:
    Added a new file that redirects to the actual pbmslib.h which is in "storage/pbms/lib". This was added in order to simplify the build process. When installed it is the actual pbmslib.h from "storage/pbms/lib" that is installed.
  • client/CMakeLists.txt:
    Added PBMS source directories to the header file search list.
    Lines added: 1.
  • client/mysqldump.c:
    Added code to recognize PBMS BLOB URLs and fetch the BLOB data and write it out to a separate file.
    Lines added: 176.
  • client/mysql.cc:
    Added code to be able to upload the BLOB dump file from mysqldump to the PBMS daemon.
    Lines added: 92
As you can see the actual changes to the MySQL code itself if fairly limited and safe. The patched MySQL server and applications can still be used without PBMS with out any negative effect.

Wednesday, May 25, 2011

PBMS Version 2 released

Version 2 of the PBMS daemon is now ready.

Here are the major changes introduced with this version:
  • PBMS is fully integrated with MySQL 5.5:
    PBMS is now provided as a patch for MySQL 5.5 which simplifies installation and provides numerous benefits.

    • All engines are "PBMS enabled":
      PBMS no longer requires that you have a "PBMS enabled" storage engine to be able to use PBMS.

    • The MySQL client lib provides the PBMS client API:
      You no longer need to link your application to a separate PBMS lib to use the PBMS 'C' API.

    • mysqldump understands PBMS BLOB URLS:
      When dumping tables or databases containing PBMS BLOB URLs mysqldump will dump the referenced BLOBs as binary data to a separate file. Since the BLOBs are dumped to their own file there is no need to convert them to hex data so they consumes only half the disk space they would have otherwise. The dump process is faster and uses less memory because the BLOBs are streamed directly from the PBMS daemon into the file.

    • The mysql client handles the PBMS BLOB dump file:
      When restoring a database or table from a dump, the file into which the BLOB data was dumped can be passed as a command-line argument. The mysql client will then stream the BLOB data directly to the PBMS daemon which is faster and requires far less memory than if it where sent back to the server via 'insert' statements.

  • A PBMS daemon ID is part of the BLOB URL:
    Each PBMS daemon has its own unique ID number. This allows the PBMS daemon to recognize and handle inserts of PBMS BLOB URLs from other PBMS daemons. A PBMS system table is provided into which daemon information can be inserted for other remote PBMS daemons.

  • BLOB replication is handled automatically:
    When a PBMS BLOB URL is inserted int a table on a slave server the PBMS daemon recognizes that the URL comes from another PBMS daemon and so it sends a request to the original daemon and pulls the BLOB across so that it is now replicated locally.

  • New internal BLOB indexing system:
    A new PBMS BLOB indexing system improves BLOB access performance and BLOB tracking.

  • PBMS system tables are now indexed:
    The PBMS system tables that provide access to BLOB metadata are now indexed so that accessing the tables no longer automatically results in a table scan of the entire BLOB repository.
This is a major version change and as a result is not backward compatible with the earlier versions of PBMS. If you have an older installation that you need to upgrade please contact me and I will give you details on how best to do this.

The documentation and web page for PBMS has also been updated so I invite to check it out at: http://www.blobstreaming.org

You may notice a change in the documentation with regards to the definition of PBMS. The abbreviation 'PBMS' is being kept but what it stands for is being changed.  Originally it stood for "PrimeBase Media Streaming" but it occurred to me that when a DBA is designing a database system they are not likely to ask them selves the question "How will I stream my media?" but they may well ask them selves "How will I manage my BLOBs?".  So PBMS now stands for "PrimeBase BLOB Management System" which I think is a little more intuitive.

Wednesday, April 20, 2011

MySQL 5.1 Plugin Development

I just wanted to compliment Sergei Golubchik and Andrew Hutchings on their book "MySQL 5.1 Plugin Development". This is a must have for anyone starting out or in the process of writing plugins for MySQL. It provides a lot of detail that is missing in other books or if you are just trying to figure things out by looking at what other plugins do.

This is the first place I have ever seen the table and index flags explained along with information on what the MySQL server will do with this information.

Well done!

Wednesday, April 13, 2011

PBMS presentation at MySQL Conference

Just a reminder that I will be presenting a session on PBMS at the
MySQL Conference on Thursday April 14 at 10:50.

The title is "BLOB Data And Thinking Out Side The Box" where I will be talking about the new PBMS daemon with a focus on how it handles replication and backup.


Hope to see you there!

Tuesday, April 5, 2011

Why use PBMS?

Why use PBMS?


I have talked to people about why they should use PBMS to handle BLOB data often enough, so I was surprised when someone asked me where they could find this information and I discovered I had never actually written it down anywhere.  So here it is.

If you are unfamiliar with PBMS, PBMS stands for PrimeBase Media Streaming. For details please have a look at the home page for BLOB Streaming.

 
Both MySQL and Drizzle are not designed to handle BLOB data efficiently. This is not a storage engine problem, most storage engines can store BLOB data reasonably efficiently, but the problem is in the server architecture itself. The problem is that the BLOB data is transferred to and from the server as part of the regular result set. To do this both the server and the client must allocate a buffer large enough to hold the entire BLOB. DBMSs that are designed to handle BLOBs such as Oracle, SyBase, and PrimeBase all pass the BLOB data outside of the regular result set. This way they avoid the requirement of having to buffer the entire BLOB. APIs such as ODBC understand this and provide functions such as SQLGetData() that can be called multiple times to retrieve data in chunks so that the client doesn’t need to buffer BLOBs if it doesn’t need to.

PBMS is designed to address this problem and provide MySQL and Drizzle with a means to efficiently handle BLOB data by allowing BLOB data to be transferred outside of the regular result set.

There are currently 2 approaches to handling BLOBs when using MySQL or Drizzle, one approach is to just store the BLOB data in the database in a Blob column which I will call the “BLOBs in database” approach, the other is to store the BLOB in a file some where and then store the path to the file in the database, which I will call the “BLOBs in files” approach.  I will compare these 2 methods of handling BLOB data to the "PBMS" approach which is to use the PBMS daemon.


 The BLOBs in database approach:

Advantages:
  • Simple to implement, BLOB data is treated no different than any other data.
  • Flexible, DBMS independent applications can be written using ODBC or JDBC to access the data.
  • The referential integrity of the BLOB data is ensured by the database.
  • Standard database maintenance ensures the security of the data.

Disadvantages:
  • The BLOB data is buffered on both the server and client side so that a 100 M BLOB will require a 100 M buffer on the server and then another on the client to receive the BLOB into.  If the server is busy handling 100 such requests then it will need 10 G of buffer space.
  • Database replication becomes impractical because of the size of the logs when the BLOB data is written to them.
  • The use of mysqldump, or similar tools to backup databases result in huge backup files because the BLOBs must be converted to hex strings in order to write then to the backup log which doubles their size.
  • The MySQL cluster server cannot be used with databases containing BLOBs.


The BLOBs in files approach:

Advantages:
  • The BLOB data is not part of the result set so large buffers are not required.
  • The BLOB data can be store in a location that is remote from the database server.
  • Standard replication will work (but the BLOB data will not be replicated).
  • Standard backup procedures can be used with the database (but the BLOB data will not be backed up).

Disadvantages:
  • A separate backup solution must be found for the BLOB data while keeping it consistent with the database backups.
  • A separate solution is required to replicate BLOB data.
  • Requires a custom designed system including client software.
  • The client software needs to know how the BLOBs are stored and needs to be provided with a method of accessing the BLOBs. If the BLOBs are not located locally to the client then additional software may be required.
  • The referential integrity, making sure that the BLOB files being stored on the file system are consistent with the BLOB references stored in the database, is no longer controlled by the database server.
  • Doesn’t scale well because most file system perform poorly when the number of files starts to exceed a couple of million. 
  • Installation and maintenance is more complex because specialized knowledge is required.
  • The client application is responsible for handling  the effects of transaction rollbacks in ensuring referential integrity.


The PBMS approach:

Advantages:
  • Simple to implement, all data storage and access is handled by the database server and PBMS engine.
  • Flexible, DBMS independent applications can be written using JDBC to access the data.
  • The referential integrity of the BLOB data is ensured by the database.
  • Standard database maintenance ensures the security of the data. No special knowledge is required.
  • Replication of the BLOB data is possible.
  • The BLOB data can be streamed in and out of the database so that buffer sizes are independent of the size of the BLOBs.
  • Better performance, test show that inserts and selects are significantly faster when BLOBs of 50 K or more are handled using PBMS.
  • The solution scales well, BLOBs are packed into files so the number of files in the file system is much less than you would get with a one BLOB per file system.
  • The maximum size of a BLOB is only limited by the maximum file size of the host machine.
  • The BLOB data can be stored in a location remote from the database server, such as on a different machine or in S3 cloud storage. This reduces the load on the database server host and it’s network bandwidth use.
  • The PBMS daemon ensures that BLOB inserts and deletes are handled properly in the event of a transaction rollback. Transaction check points are also supported.

Disadvantages:
  • PBMS is not shipped with any MySQL distribution, but it is ship with Drizzle and can be downloaded from http://www.blobstreaming.org.
  • The MySQL server does not directly support PBMS but Drizzle does provides direct support.

Although MySQL doesn’t support PBMS directly it is not difficult to add support to the InnoDB engine and anyone interested can contact me and I will happily assist them with it. I use InnoDB for most of my testing.


Conclusions:

PBMS provides efficient BLOB handling that is missing from MySQL and Drizzle. This would be enhanced greatly by integrating PBMS support more directly into the MySQL server.

I currently have plans to increase the support for PBMS in drizzle by adding a new column type for storing PBMS BLOB references. This enables the use of PBMS to be part of the database schema design and simplifies support for client libraries. The client library will then be able to recognize it is getting a BLOB reference and can then make calls to the PBMS daemon to stream the data back to the client. Ideally this is the type of support MySQL should also provide PBMS.

Wednesday, March 23, 2011

New PBMS version

A new version of PBMS for drizzle has been pushed up to launchpad:

drizzle_pbmsV2

I have rewritten PBMS and changed the way that BLOBs are referenced in order to make PBMS more flexible and to fix some of it's limitations. I have also removed some of the more confusing parts of the code and reorganized it in an attempt to make it easier for people to find there way around it.

So apart form some cosmetic changes what is different?

Maybe the best answer would be to say what hasn't changed: the user and engine API  and the way in which the actual data is stored on the disk remains pretty much unchanged, but everything else has changed.

The best place to start is with the BLOB URL, the old URL looked like this:
"~*1261157929~5-128-6147b252-0-0-37"
the new URL looks like this:
"pbmsAdaVAQCAAAAAAAAAANaVAQCAAAAAAAAAAG30qzsGAAAAAAAAAAEAAACAAAAAAAAAAAEAAAAAAAAA"
which is obviously a lot more intuitive.  :)

OK maybe it is bigger and uglier but it contains a lot more information. It is actually a base64 URL encoding of a data  structure containing information about the BLOB that makes it universally locatable across different PBMS daemons running locally or remotely.

How is this done?

When a BLOB is uploaded to a PBMS daemon the URL generated for it contains, among other things, the PBMS daemon's server id as well as the database ID and the BLOB's repository index value. These 3 values remain with the BLOB for it's life regardless of what server or database it may eventually end up in.  This allows you to insert a BLOB URL from one database into another database, possibly on a different server, and the PBMS engine will be able to use the URL to look up the blob, if it cannot find it in the database's repository then PBMS will automatically fetch the BLOB from the source server or database.

When fetching the BLOB the current server id, database id and index, which are also stored in the URL, are used.

This means that the following will work:

insert into foo.blob_table1 select * from  bar.ablob_table;
insert into foo.blob_table2 select * from  bar.ablob_table;

The first insert will copy the BLOBs from the BLOB repository for database 'bar' into the repository for database 'foo'.

The second insert will will recognize that the BLOBs already exists in foo's BLOB repository and just add references to them.

The same would hold true if database 'foo' was on a different server on the other side of the world.

BLOBs and replication:

A practical use for this is with replication, the replication process replicates the BLOB URLS to the slave server and PBMS pulls the BLOB across automatically. You can try this out using drizzle replication and my drizzle_pbmsV2 branch.

The only thing you need to do is tell the slave server how to map the PBMS server ID to the actual server. You do that by inserting the information into the 'pbms_server' table in the slave machine's pbms database. The master server's PBMS server ID can be found by doing the following select on the master server:
select * from pbms.pbms_server;
Resulting in something like this:
+-------+-----------+------+--------------+
| Id    | Address   | Port | Description  |
+-------+-----------+------+--------------+
| 38358 | localhost | 8080 | This server. |
+-------+-----------+------+--------------+

Then on the slave server do the following insert:
 insert into pbms.pbms_server values(38358, "master_host", 8080, "The master replication server");
where "master_host" will be the same IP address as you have for "master-host" in the drizzle slave config file.

Note: The PBMS server ID is not the same as the drizzle server id.

What else:

With the new design of the PBMS daemon it would not be very difficult to create a stand alone BLOB repository server that could be used as a backup for BLOBs or a a central repository for a cluster of servers. 

The next step though is to update the PBMS documentation and build a version  for MySQL.

Friday, March 11, 2011

PBMS Performance

I have been doing some performance testing with PBMS and found a few things that were kind of interesting. The main finding was that you start to see performance improvements when data sizes start to reach the 20K level. This was seen when replacing a 20 K varchar field with a longblob column in a PBMS enabled table.

The following graph shows the performance differences for 'select' and 'insert' statements using a PBMS enabled version of InnoDB on an 8 core machine.


The test compares the insert and select performance of LongBlob columns with PBMS support against that of varchar and longtext columns when using InnoDB.

The test shows that depending on if your application is more heavily weighted towards Inserts or selects it may be beneficial to replace columns containing more than 10K of data with longblob columns with PBMS support. In all cases the performance of both 'selects' and 'inserts' was improved for columns containing more than 20K when using longblob columns and PBMS. As the data size in the column increased the performance gain by using PBMS also increased, at 10 M there was %580 performance improvement for selects.

The testing showed some performance irregularities with PBMS which when fixed should result in even better performance.

If anyone would like to try this themselves please contact me and I can give you a copy of the test tool I used as well as a PBMS enabled version of InnoDB.

Barry