Thursday, July 22, 2010

PBMS version 0.5.015 beta has been released.

A new release of the PrimeBase Media Streaming daemon is now available for download at
http://www.blobstreaming.org .

This release doesn't contain any major new features just some bug fixes and a lot of house keeping changes.

If you look at the download section on http://www.blobstreaming.org you will see that there are now more packages that can be downloaded. I have separated out different client side components from the PBMS project and created separate launchPad projects for each one. You can see them listed in the "Related Links" side panel to the right of this post.

  • The "PBMS Client Library" facilitates communication with the PBMS daemon. This library is independent of the PBMS daemon's host server and can be used to communicate with a PBMS daemon hosted by the MySQL or Drizzle database servers.
  • The "PBMS PHP extension" is a PHP module that enables PHP to connect directly to the PBMS engine and stream BLOB data in and out of a MySQL or Drizzle database.
  • The "Streaming enabled JDBC Driver" is a streaming enable version of the standard MySQL Connector/J, JDBC Driver.
One minor new feature that was added is that the PBMS HTTP server how understands the HTTP "range" header that can be used to request a section of BLOB data. The Client Library and PHP extension have both been updated with pbms_get_data_range() functions.

Thursday, July 8, 2010

PBMS is in the Drizzle tree!

If you haven't already heard PBMS is now part of the Drizzle tree.

Getting it there was a fair bit of work but not as much as I had thought it would be. The process of getting it to work with Drizzle and running it thorough Hudson has improved the code a lot. It is amazing what some compilers will catch that others will let by. I am now a firm believer in treating all compiler warnings as errors.

I am just in the process of updating the PBMS plugin so that it will build and install the PBMS client library (libpbmscl.so) as well as the plugin. The PBMS client library is a standalone library that can be used to access the PBMS daemon weather it is running as part of MySQL or Drizzle. So a PBMS client library built with Drizzle can be used to access a PBMS daemon running as part of MySQL and vice-versa.

There is also PHP extension for PBMS that is basically just a wrapper for the library. Currently this is part of the PBMS project on launchpad but I am working on getting it into pecl. The PHP extension has a set of test cases with it which is what I use to test PBMS with.

If anybody is interested in taking on the task of creating a python module for PBMS I would be happy to provide what ever help you may need. I think it would just be a wrapper around the PBMS client library almost identical to the PHP extension. I would recoment just taking the PHP extension and converting it.

Now that I have PBMS in Drizzle I am planning on getting replication working with PBMS. I have decided that the best way to do this is to do a bit of work rearranging how PBMS uses the BLOB URLs to reference the BLOB data. The URLs already contain a server, database, and table id so the plan is to change things so that PBMS can handle BLOB URLS from other servers being inserted or referenced. Once this is working then I will automatically have replication and 95% of what is needed to support clustered servers.

I will go into detail on this in a later posting which will include pretty pictures.

Barry

Tuesday, April 27, 2010

BLOBs are not just blobs

Recently when talking to someone about PBMS it occurred to me that I had been thinking about BLOBs in the traditional database sense in that they were atomic blocks of data the content of which the server knew nothing about. But with PBMS that need not be the case.

The simplest enhancement would be to allow the client to send a BLOB request to the PBMS daemon with an offset and size to just return a chunk of the BLOB. Depending on the application and the BLOB contents this may make perfectly good sense, why force the client to retrieve the entire BLOB if it only want part of it.

A much more interesting idea would be to enable the user to provide custom server side functions that they could run against the BLOB.

So how would his work?

The PBMS daemon would provide its own "BLOB functions" plugin API. The API would be quite simple where the plugin would register the function names it supports. When the PBMS daemon receives a BLOB request specifying a BLOB function name, it calls the BLOB function passing it a hook to the BLOB data and then returns to the client what ever the function returns.

The first use of this that I can imagine would be to provide a function that would return the thumbnail from a jpeg image rather than the entire image. Other functions may just return the jpeg metadata.

The idea is that BLOBs are not just blobs but are highly structured documents which, given the knowledge of the document structure, it is possible to return portions of the BLOB that are of interest to particular applications.

Friday, March 5, 2010

PBMS daemon performance update.

I have updated the PBMS download to 0.5.12-beta.

When doing some performance testing I found a severe bug that effectively limited the upload of files to 1 file every 2 seconds regardless of file size. :( Now that I have fixed this it is possible to upload 1000+ BLOB per second depending on size.

I believe this bug has been in there for the last couple of versions. So if you had tried PBMS and thought it slow please try it again.


This version also includes a performance test tool called pbms_performance. I plan on posting some performance data soon.

Barry

Friday, February 26, 2010

PBMS version 0.5.011 beta has been released.

A new release of the PrimeBase Media Streaming daemon is now available for download at
http://www.blobstreaming.org .

This release has been focused on getting S3 storage back into PBMS.

What's new:

  • The PBMS daemon now provides two storage methods for BLOB data: local storage and S3 storage. Using S3 storage, repository records are kept for each BLOB the same as in local storage but the actual BLOB data is stored remotely on an S3 server. Click here to see my earlier posting on S3 storage.
  • When using S3 storage the PBMS daemon holds the public/private keys but the actual transfer of BLOB data, both upload and download, is done directly between the client application and the S3 server.
  • Backup and recovery have been modified to work with S3 storage so that BLOBs stored on S3 servers are backed up on the S3 server. To find out more about this click here to go to a previous BLOG entry on this subject.
  • The PBMS enabled MySQL java connector has been updated and is now working again.
  • The PrimeBase Media Streaming web site has been updated and the documentation has been made more user friendly.
  • A new system table was added to the 'pbms' database called 'pbms_enabled' that lists the pbms enabled engines currently loaded by the MySQL server.

As you may have noticed there has been a bit of a name change in that I now refer to the PBMS daemon as apposed to the PBMS engine. Refferring to PBMS as a storage engine was a bit confusing because its main purpose is to act as an independent daemon that handles the transfer of BLOB data to and from clients. The fact that it publishes itself internally in the MySQL server as a storage engine, supports the storage engine interface, and takes part in transactions is all part of the implementation details and is not part of its core purpose.

Speaking of core purposes: The BLOB alias feature has been removed. This feature let the user assign an alias to a BLOB that could then later be used in place of the BLOB URL generated by PBMS to access the BLOB. It was decided that this feature was not part of the PBMS daemon's core purpose and since supporting it in a transactional manner would have added considerable complexity to the PBMS daemon it was dropped.


What's next:

  • I really want to get this working with drizzle.
  • Run some performance tests comparing BLOB handling performance on MySQL with and with out the PBMS daemon and its impact on the over all performance of the MySQL server.
  • The performance with S3 is not as good as it should be, the cause of this needs to be found and fixed.