"New Breed" Database Extensibility

At present, I can think of at least 5 "new breed" database vendors that you allow you to extend their SQL language in some form or another:

Netezza
ParAccel
Kognitio
Greenplum (I think)
DATAllegro/Microsoft

Of course, the old guard is well-represented in this category as well - Oracle, DB2, SQL Server, Teradata, etc. all allow language extension via custom functions, plug-ins, etc.

I don't know enough about it yet to know, but I think that Greenplum and AsterData might also belong on this list due to their support for Map/Reduce.

It wasn't all that long ago (3 or 4 years) that most of these vendors didn't even support the full SQL standard, never mind compiled-code extensions to their SQL language. Oh, the possibilities.

ParAccel Update

Not long ago I had the pleasure of speaking with Kim Stanick of ParAccel to get an update on what they've been doing and where they're headed. Here are the highlights from my notes.

Business News


Company Growth


Unlike many (or even most) companies, ParAccel grew last year, doubling from roughly 30 to 60 employees.

POCs


ParAccel spent its first year of operation focused largely on competing effectively in POCs. They now measure POC lengths in days rather than weeks, and are actively engaged in multiple concurrent POCs in the field. In addition, POC sizes are growing, and now range between 10TB and 100TB systems, with tests often performed in both in-memory and on-disk configurations. To me, the combination of POC speed (days), location (in the field), size (100TB) and flexibility (memory vs. disk) is a sign that the product has matured rapidly.

I saw a handful of selected results from POCs with a variety of customers (retail, financials, entertainment, etc.), but the quote that really jumped out at me concerning POCs was "unbeaten on performance". The performance results I saw were impressive (though I honestly don't expect them to show me the unimpressive ones). To their credit, there were a couple queries in the POC results where ParAccel was not faster, and those were not redacted. A good sign if you ask me. Cynically, however, I have to wonder: if they're not getting beaten on performance, where are they getting beaten? ;-)

Reference Customer


Late last year ParAccel announced its first (as far as I'm aware) reference customer, namely Merkle. Interestingly Merkle selected ParAccel to replace an in-house data cleansing/integration application, not for reporting/analytics purposes. Said application hasn't been fully ported by Merkle yet, however, so I have a hard time calling this a big win yet. Reportedly, however, Merkle is finding analytic uses for their PA system in the interim. I wonder where they'll ultimately land.

More information about what Merkle has done with ParAccel is available in this TDWI webinar if you're interested.

EMC Partnership


ParAccel's partnership with EMC appears to be serving them well. EMC is inviting them to POCs, which can't be a bad thing. This makes imminent sense, really, as any deal won by one of the database vendors with embedded storage (e.g. Netezza, Exasol, Dataupia, etc.) means a deal with no money in it for EMC. I don't know yet, however, how many of those POCs ParAccel has won, though I think there has been at least one.

Technical Tidbits


Though our conversation was sales-and-marketing heavy, I did glean a few interesting technical tidbits.

• ParAccel compiles queries. Given the obvious connection between ParAccel and Netezza this isn't really surprising though.
• ParAccel supports SQL extensions via "custom C functions". I look forward to learning more about this.
• ParAccel uses a proprietary interconnect protocol to reduce packet loss and improve the performance of inter-node communications.
• ParAccel's support for Microsoft and Oracle SQL extensions is strong enough that POCs often involve "no rewriting of SQL".

More information is available in a soon-to-be-released "technical overview" whitepaper. I got a peek, and it's well worth reading - check it out when it becomes available.

Parting Thoughts


I tend to put more emphasis on how somebody says something than what they actually say; something like 90% of interpersonal communication is non-verbal, after all. And while it's certainly a VP of Marketing's job to put on a good face, Kim's general demeanor struck me as unusually relaxed and quietly confident. That, together with their forthrightness about situations where ParAccel isn't faster than the competition, gave me the sense that things are going pretty well for ParAccel. Companies who are confident about their strengths generally don't hide from their weaknesses, in my experience.

Only time will tell.

Kognitio Lands Semi-interesting Client

Kognitio announced yesterday that the National Center for Genome Resources has "deployed Kognitio's WX2 purpose-built database". I find this semi-interesting for two reasons:

• "Deployed" is very different than "selected"
• I'm a bit tired of hearing about "enterprises" who have "selected" a particular MPP database for this that or another. This isn't really Earth-shattering news either, but at least it's a bit different.

Now, I don't know the first thing about genome research, but I do know that at least one MPP vendor has added functionality specifically for genome research clients. As such I wondered whether Kognitio had done the same to close the NCGR deal. When asked, however, the folks at Kognitio said that "NCGR [is] using pure SQL as they migrate over". I'm not sure that means that NCGR is doing the same old boring SQL stuff as everybody else or that MPP databases have finally matured to the point that you can write the complex (and often ugly) SQL queries that mature OLTP systems have been supporting for years. My guess is that the answer is somewhere in the middle.

Like I said... semi-interesting.

Vertica Offers Two-Day Training Class in April

I must be blind, because I can't find a link to this anywhere on Vertica's site, but... Vertica is offering a two-day training program in early April that looks very interesting. For details see http://www.vertica.com/vertica-training.

Given my proximity to Billerica I might just have to go to this... I just have to convince my CFO to pay for it. :-P

Viewing the Execution Plan of a Running Query in SQL Server

Or, Simple SQL Server DBA Tricks

Here's a trick I wish I'd discovered long ago... given a long-running query started by session ID of 42, you can get the handle to the cached execution plan for the query by running:

SELECT plan_handle FROM sys.dm_exec_requests WHERE session_id = 42

Given that plan handle, you can get an XML representation of the query plan by running:

SELECT query_plan FROM sys.dm_exec_query_plan (0x06001D009FE38431400399D4000000000000000000000000);

(The hex value there is the value returned by the previous query.)

Now, save the output of that query to a file with a .sqlplan extension, and open it with Management Studio. Viola! A graphical representation of the query plan for the problem query as it was actually executed.

Grabbing the query and generating a plan for it will usually give you the same result, I believe. It has always bugged me that there was a potential difference between the plan generated for a problem query at run-time vs. the plan generated at problem diagnosis time though. This relieves that little concern and provides other options for working with the query plan as well (it is XML, after all).