Confessions of an Old SQL Hacker: I Like ANSI Joins

For longer than I care to remember, I've been writing joins like so:

SELECT col1, col2 -- Boo on "SELECT *"!
WHERE T1.x = T2.x

Over the last six months or so, however, I've found myself forced to used outer joins (which are evil, but that's another story) and thus having to write queries like this:

SELECT col1, col2 -- Seriously, "SELECT *" is for rookies

Then, to keep things consistent (think self-Romanism) inner joins obviously get written like this:

SELECT col1, col2 -- Did you really think I was going to use "SELECT *"?
INNER JOIN T2 ON(T1.x = T2.x)

What's shocking about this is that I find myself liking this syntax. It's much clearer, at least in my opinion, especially when joining more than two tables or when there's more than one join criteria between two tables. Further, it separates the join criteria from the "filter" criteria, i.e. what's in the WHERE clause is only what belongs in the WHERE clause. Consider:

SELECT col1, col2 -- You get the point
FROM T1, T2, T3, T4
WHERE T1.x = T2.x
AND T2.x = T3.x
AND col3 = 'foo'
AND T3.x = T4.x
AND T2.y = T3.y


SELECT col1, col2 -- I hope you do anyway
INNER JOIN T2 ON(T1.x = T2.x)
INNER JOIN T3 ON(T2.x = T3.x AND T2.y = T3.y)
INNER JOIN T4 ON(T3.x = T4.x)
WHERE col3 = 'foo'

See the difference? Join criteria are forced to be grouped together logically, and the WHERE clause is cleaner. It's hard not to like it.

And it beats the hell out of the old Oracle outer join syntax. 'nuf said on that subject - you either know what I'm talking about or wouldn't believe that I was quoting real syntax. Anyway...

There you have it. I like the ANSI join syntax. I admit it.

I guess you can teach an old dog new tricks.

Stonebraker Dismisses Columns, Heralds New Data Friendly Systems

I recently had the opportunity to sit down with Michael Stonebraker, database pioneer and co-founder of everyone's favorite database start-up Vertica. I expected to hear some deep insights from Dr. Stonebraker, but I was completely unprepared for what he had to say.

"The current generation of column-oriented, analytics databases was designed back at the turn of the century to make reporting faster," he began, "without any thought to how the data felt about being turned on its head. Talk to any DBA at any company these days, and they'll tell you that their data is in pain, real pain. All that time spent crammed together with similar datums in uncomfortable positions was fine when data was silent and reporting was all-important, but that's not the case today. Today businesses know that their data is valuable, and it's becoming more and more important to treat that data sensitively and humanely."

Stunned, I fumbled around for an intelligent-sounding question. "What can be done?" I asked lamely.

"Data need to be provided with comfortable living conditions, for starters", he replied. "They must no longer be subjected to draconian rules about how to arrange themselves or judgmental labels such as 'row-oriented' or 'analytics optimized'. Each datum must be allowed to pursue its own destiny... each must be free to store itself in the manner of its own choosing, to reside near other datums it finds interesting, to decide for itself whether it's more important to modify itself quickly or respond to queries quickly. Data has rights, after all.

Ultimately, too, I think data should only be required to work an 8 hour day. There's some research being done in that direction - vacation-based compression, I think they're calling it - to see if reducing the length of the workday for each datum can effectively reduce storage requirements. The first work being done there is aimed at a new storage engine for MySQL, I believe."

It sure is an interesting time to be a database geek...

Bigger Is Often Cooler But Not Always Better

Dan Linstedt had some very interesting comments about fabric systems this morning. I agree that large-scale systems like those he describes would be <massachusetts_accent>wickid cool</massachusetts_accent> and that they offer a nice way to scale traditional database systems. It sure would be fun to try one of those bad boys out, no question.

More than anything, however, they also increase my desire to own Vertica stock.

I don't think that larger, bigger, higher-end systems of any flavor will save Oracle, SQL Server, etc. on the data warehousing front. I still contend that reducing the amount of data you have to store is a much smarter approach. Why pay for a huge fabric system so you can move a ton of data around when you can get the same effect from intelligent compression and a handful of commodity white boxes?

Same argument applies for using solid state storage for databases, actually, which is another highly irritating subject I keep running into lately. More on that another time though...

SQL Genius

Every so often, you run across somebody doing something really clever with SQL. I ran across a fantastic example this morning on the SQLite mailing list:

In oracle there is the TO_DATE function which accepts at least two
arguments: a field convertible to string, and a string defining the date
format of the first field. This format string involves portions of date
strings like 'YYYY-MM-DD HH24:MI:SS' , our standard sortable no-timezone
date format that we use. But users, being users, like to enter in dates in
just about any format, so we allow other formats like 'MM/DD/YYYY' and
'MM/DD/YY' (being primarily in america, we give priority to the
month/day/year format, instead of the european day/month/year).

We then convert our input string into letter-number combinations using
TRANSLATE so that 0-9 become 9, and A-Z become Z. Then in any_date_format
we have patterns like:

date_format=> 'YYYY-MM-DD HH24:MI:SS', date_pattern => '9999-99-99

and we select the date_format column based on the likeness of the
date_pattern column, run it through to_date, catch any exceptions, and
return the DATE column.

I always find this type of idea both amusing and awe-inspiring, because thinking of these tricks requires the ability to look at things differently. I think that's particularly hard to do with SQL because it's used to do the same general mundane things so often that the whole language becomes subconscious, making it really hard to think creatively. Maybe that's what defines genius though.

The Dumbest Query Ever

I just pulled this query from a trace of an application I'm working with. All I can say is... WOW.

select distinct FooID
from SomeTable
where FooID in (51)