Thursday, May 26, 2011

What is in the PBMS patch for MySQL 5.5

I thought people may be interested to know what the PBMS patch for MySQL actually patches, in case they should think this is a major hack into the MySQL source code.

Almost all of  the patch consists of  the PBMS daemon source code which is added to the "storage/pbms" folder in the MySQL source code tree. Other than that here is a list of the actual MySQL files touched and what the patch is for:

  • sql/CMakeLists.txt:
    Added PBMS source directories to the header file search list.
    Lines added: 1.
  • sql/handler.cc:
    Added PBMS server side API calls to check for longblob columns being modified or tables containing longblob columns being dropped or renamed. This is the guts of the PBMS patch.
    Lines added: 170.
  • libmysql/CMakeLists.txt:
    Added PBMS API functions to the client API functions list and the PBMS source directories to the header file search list.  Also adds the PBMS lib source code to the MySQL client lib build.
    Lines added: 61.
  • include/mysql.h:
    Added a line to include the PBMS  LIB header file "pbmslib.h".
    Lines added: 3.
  • include/mysql.h.pp:
    Added lines to reflect the changes made to the MySQL client API when adding the PBMS API to it:
    Lines added: 42.
  • include/pbmslib.h:
    Added a new file that redirects to the actual pbmslib.h which is in "storage/pbms/lib". This was added in order to simplify the build process. When installed it is the actual pbmslib.h from "storage/pbms/lib" that is installed.
  • client/CMakeLists.txt:
    Added PBMS source directories to the header file search list.
    Lines added: 1.
  • client/mysqldump.c:
    Added code to recognize PBMS BLOB URLs and fetch the BLOB data and write it out to a separate file.
    Lines added: 176.
  • client/mysql.cc:
    Added code to be able to upload the BLOB dump file from mysqldump to the PBMS daemon.
    Lines added: 92
As you can see the actual changes to the MySQL code itself if fairly limited and safe. The patched MySQL server and applications can still be used without PBMS with out any negative effect.

Wednesday, May 25, 2011

PBMS Version 2 released

Version 2 of the PBMS daemon is now ready.

Here are the major changes introduced with this version:
  • PBMS is fully integrated with MySQL 5.5:
    PBMS is now provided as a patch for MySQL 5.5 which simplifies installation and provides numerous benefits.

    • All engines are "PBMS enabled":
      PBMS no longer requires that you have a "PBMS enabled" storage engine to be able to use PBMS.

    • The MySQL client lib provides the PBMS client API:
      You no longer need to link your application to a separate PBMS lib to use the PBMS 'C' API.

    • mysqldump understands PBMS BLOB URLS:
      When dumping tables or databases containing PBMS BLOB URLs mysqldump will dump the referenced BLOBs as binary data to a separate file. Since the BLOBs are dumped to their own file there is no need to convert them to hex data so they consumes only half the disk space they would have otherwise. The dump process is faster and uses less memory because the BLOBs are streamed directly from the PBMS daemon into the file.

    • The mysql client handles the PBMS BLOB dump file:
      When restoring a database or table from a dump, the file into which the BLOB data was dumped can be passed as a command-line argument. The mysql client will then stream the BLOB data directly to the PBMS daemon which is faster and requires far less memory than if it where sent back to the server via 'insert' statements.

  • A PBMS daemon ID is part of the BLOB URL:
    Each PBMS daemon has its own unique ID number. This allows the PBMS daemon to recognize and handle inserts of PBMS BLOB URLs from other PBMS daemons. A PBMS system table is provided into which daemon information can be inserted for other remote PBMS daemons.

  • BLOB replication is handled automatically:
    When a PBMS BLOB URL is inserted int a table on a slave server the PBMS daemon recognizes that the URL comes from another PBMS daemon and so it sends a request to the original daemon and pulls the BLOB across so that it is now replicated locally.

  • New internal BLOB indexing system:
    A new PBMS BLOB indexing system improves BLOB access performance and BLOB tracking.

  • PBMS system tables are now indexed:
    The PBMS system tables that provide access to BLOB metadata are now indexed so that accessing the tables no longer automatically results in a table scan of the entire BLOB repository.
This is a major version change and as a result is not backward compatible with the earlier versions of PBMS. If you have an older installation that you need to upgrade please contact me and I will give you details on how best to do this.

The documentation and web page for PBMS has also been updated so I invite to check it out at: http://www.blobstreaming.org

You may notice a change in the documentation with regards to the definition of PBMS. The abbreviation 'PBMS' is being kept but what it stands for is being changed.  Originally it stood for "PrimeBase Media Streaming" but it occurred to me that when a DBA is designing a database system they are not likely to ask them selves the question "How will I stream my media?" but they may well ask them selves "How will I manage my BLOBs?".  So PBMS now stands for "PrimeBase BLOB Management System" which I think is a little more intuitive.