Evaluating ETL Products is Hard
Comparing ETL and integration products is tough.
They all look alike.
The sales presentations are all the same.
The canned demos are indistinguishable.
Information from IT analyst firms is so high level it's worthless, even from the ones who cover the BI/DW space pretty well.
The standards, sources, targets and other checkbox-style comparisons don't provide the information you need to select the right tools.
Only a series of questions about implementation details exposes the things you need to know in order to make a decision. For example, one recent comparison I was doing went something like this:
Does the product bring data back to the engine or push transformations into SQL?
"It depends. It could do either.:
If the engine does some work then what happens when I bring over a mammoth amount of data?
"It buffers the data."
OK, so if I have a 4 GB SQLserver table and an 8 GB Oracle table to join, how is this accomplished?
"It accesses the data from the database using SQL."
OK, but it's different databases, so how does the tool do the join?
Vendor A: "One table is brought back and used as a lookup to the other remote table."
Vendor B: "Both tables are retrieved locally."
If it's that large, isn't caching locally a problem? Does it cursor through the data to keep from sucking up all the physical memory?
Vendor A: "You have to manage the input buffering, or it spills into a file/hash structure." (In other words, performance is poor enough that you won't see results)
Vendor B: "If you exceed 2 GB of buffering then the job will fail, so you have to work out a way to limit the data coming back." (In other words, more data than memory = crashed job)
In a simple example like that, you can follow the trail to a problem you might face. There are examples you'd never think to ask about unless you were already familiar with the products, in which case you wouldn't need to ask.
This is why I always do a hands-on demo or proof-of-concept on real data, using logic simpler than "map column A to column B". Sure, 70% of the integration work is simple mapping, but the other 30% is multiple tables, outer joins, calculations, lookups, and dealing with bad data. I wish vendors would demo the hard stuff instead of the easy stuff.
That's why I'm extending the ETL evaluation course at the Data Warehouse Institute to a full day. The first half is all about what to evaluate and how to evaluate it. The second half will be live, with vendors showing how to accomplish a given task. I'll start easy, and follow the usual path into the weeds of poor data quality and bad development planning. Maybe I'll finally see some problems fixed in some of my favorite tools, since they won't want to be embarrassed during a live demo with competitors. It should be fun! Look for the first run of the revised class at TDWI in Las Vegas next February.
Meanwhile, if there are vendors you'd like to see outside of Informatica and Ascential, er, IBM, I'm open to suggestions. You can leave a comment on this post.
Posted by Mark Tuesday, August 09, 2005 9:53:00 PM |