"New Breed" Database Extensibility

At present, I can think of at least 5 "new breed" database vendors that you allow you to extend their SQL language in some form or another:

Netezza
ParAccel
Kognitio
Greenplum (I think)
DATAllegro/Microsoft

Of course, the old guard is well-represented in this category as well - Oracle, DB2, SQL Server, Teradata, etc. all allow language extension via custom functions, plug-ins, etc.

I don't know enough about it yet to know, but I think that Greenplum and AsterData might also belong on this list due to their support for Map/Reduce.

It wasn't all that long ago (3 or 4 years) that most of these vendors didn't even support the full SQL standard, never mind compiled-code extensions to their SQL language. Oh, the possibilities.

ParAccel Update

Not long ago I had the pleasure of speaking with Kim Stanick of ParAccel to get an update on what they've been doing and where they're headed. Here are the highlights from my notes.

Business News


Company Growth


Unlike many (or even most) companies, ParAccel grew last year, doubling from roughly 30 to 60 employees.

POCs


ParAccel spent its first year of operation focused largely on competing effectively in POCs. They now measure POC lengths in days rather than weeks, and are actively engaged in multiple concurrent POCs in the field. In addition, POC sizes are growing, and now range between 10TB and 100TB systems, with tests often performed in both in-memory and on-disk configurations. To me, the combination of POC speed (days), location (in the field), size (100TB) and flexibility (memory vs. disk) is a sign that the product has matured rapidly.

I saw a handful of selected results from POCs with a variety of customers (retail, financials, entertainment, etc.), but the quote that really jumped out at me concerning POCs was "unbeaten on performance". The performance results I saw were impressive (though I honestly don't expect them to show me the unimpressive ones). To their credit, there were a couple queries in the POC results where ParAccel was not faster, and those were not redacted. A good sign if you ask me. Cynically, however, I have to wonder: if they're not getting beaten on performance, where are they getting beaten? ;-)

Reference Customer


Late last year ParAccel announced its first (as far as I'm aware) reference customer, namely Merkle. Interestingly Merkle selected ParAccel to replace an in-house data cleansing/integration application, not for reporting/analytics purposes. Said application hasn't been fully ported by Merkle yet, however, so I have a hard time calling this a big win yet. Reportedly, however, Merkle is finding analytic uses for their PA system in the interim. I wonder where they'll ultimately land.

More information about what Merkle has done with ParAccel is available in this TDWI webinar if you're interested.

EMC Partnership


ParAccel's partnership with EMC appears to be serving them well. EMC is inviting them to POCs, which can't be a bad thing. This makes imminent sense, really, as any deal won by one of the database vendors with embedded storage (e.g. Netezza, Exasol, Dataupia, etc.) means a deal with no money in it for EMC. I don't know yet, however, how many of those POCs ParAccel has won, though I think there has been at least one.

Technical Tidbits


Though our conversation was sales-and-marketing heavy, I did glean a few interesting technical tidbits.

• ParAccel compiles queries. Given the obvious connection between ParAccel and Netezza this isn't really surprising though.
• ParAccel supports SQL extensions via "custom C functions". I look forward to learning more about this.
• ParAccel uses a proprietary interconnect protocol to reduce packet loss and improve the performance of inter-node communications.
• ParAccel's support for Microsoft and Oracle SQL extensions is strong enough that POCs often involve "no rewriting of SQL".

More information is available in a soon-to-be-released "technical overview" whitepaper. I got a peek, and it's well worth reading - check it out when it becomes available.

Parting Thoughts


I tend to put more emphasis on how somebody says something than what they actually say; something like 90% of interpersonal communication is non-verbal, after all. And while it's certainly a VP of Marketing's job to put on a good face, Kim's general demeanor struck me as unusually relaxed and quietly confident. That, together with their forthrightness about situations where ParAccel isn't faster than the competition, gave me the sense that things are going pretty well for ParAccel. Companies who are confident about their strengths generally don't hide from their weaknesses, in my experience.

Only time will tell.

Thoughts on the ParAccel/EMC Partnership

Rules may make life possible, but it's the exceptions that make life interesting.

There's already been plenty of talk about the new ParAccel/EMC partnership; if you're looking for details you can check the product page or the comments by Monash. The most useful and insightful writeup I've seen is here, and in fact if you read nothing else on this topic that's probably what you should read.

But what I haven't seen mentioned much - if at all - is how this partnership originated and what that means. The story, as I heard it, is that this partnership is the result of an EMC initiative to establish a presence in the analytics database space. That means ParAccel was chosen by EMC and not the other way around, which, given the plethora of "new" databases from which to choose, is a bit of a statement.

And with EMC selling the solution, ParAccel gets not only a host of functionality and prestige for free, it gets a rather large sales force. Not a bad deal for ParAccel if you ask me.

I do have one lingering doubt about this partnership though. There is the possibility that ParAccel's future may be somewhat in the hands of EMC. Indeed, ParAccel will have engineering resources devoted to the Scalable Analytic Appliance aka EMC partnership project. If EMC sells this solution and doesn't lean too heavily on ParAccel when doing so then that engineering investment will pay off quite nicely, but it doesn't take much imagination to see things getting ugly. I don't find that particularly likely, but it's a nagging possibility.

There have been plenty of partnership announcements lately, including three (yes, count 'em, 1, 2, 3) by ParAccel, but this one is different. And it's going to be very interesting to watch it unfold.

Asked and Answered: Dataupia and EnterpriseDB Compatibility

In this article, Phillip Howard asks:

...here's an interesting question: if EnterpriseDB's main claim to fame is that you can run Oracle applications without change against Postgres Plus Advanced Server; and if Dataupia makes the same claim with respect to data warehousing, then can you run Dataupia against Postgres Plus Advanced Server?


I'm no Dataupia expert, but it seems pretty clear to me that the answer is no. The reason is very simple: EnterpriseDB's offering is a stand-alone database system that acts like Oracle, while Dataupia's system is a transparent system accessed through Oracle. (Or others, but I shan't spill any specific beans.) Both are Oracle compatible, just not in the same way.

And for that matter, ParAccel will be able to make the same claim (Oracle compatibility) before too long. Yet they achieve it in an entirely different way - query routing with their "AMIGO mode" option.

In summary: Not all Oracle-compatible systems are compatible in the same way. Don't get carried away with possible combinations.

Getting to Know ParAccel, Part III

I've spent a lot of time thinking about ParAccel over the last few weeks. As I mentioned in my first post about them, I think that their impact on the analytics database industry is going to be significant, and over the last couple weeks I've been struggling with exactly what I think that impact is going to be and how to express it.

So after much teeth-gnashing, I've come to the conclusion that there isn't much to be said, and that I shouldn't complicate what little is needed. So here goes.

ParAccel and Vertica are on a collision course.

I'm pretty confident that these two companies have leap-frogged the competition and will dominate the analytics database market when all is said and done. (Note: Don't read too much into that sentence. I do not consider Netezza part of the analytics market, nor do I think that the database market will ultimately be as big or as important as the future BI market.) I don't think that the other column-oriented databases have a chance against these two, either because they're too late to market or because they've picked the wrong foundation. So for now let's assume away all the other competitors, and then think about what'll happen.

Technologically, the two products are very similar. MPP, shared-nothing, column-oriented, designed for analytics, compressed, failure-tolerant, etc. etc. etc. They're new-breed analytics databases. That's a sufficient description. What really distinguishes the two are actually non-tech factors, such as:
Ironically, while those are the same factors will cause these two companies to push out other vendors, they are unlikely to give either a large advantage over the other. Where does that lead? Best case, to a 50/50 share of the analytics database market. Worst case, mutual destruction.

Either way, I think I'd now like to own stock in both. :-)