style="margin-top:70px;" Clickstream


Lecture in November: Establishing a Data Quality Program

I'll be giving a talk on establishing a data quality program for FirstLogic at their iSummit Live conference in San Francisco on November 17. This differs from most talks I give in that it's only an hour and I usually do four to eight hour in-depth technical tutorials. It should be good because data quality problems make for interesting stories, and I plan to hand out product samples. Here's the abstract:
Case Study: Not Just Pears and Plants: Data Helps Bear Creek Grow
Delivering high quality products from Harry and David and Jackson & Perkins depends on more than just a hearty appetite or a green thumb; data also plays a key role. Hear first hand how Bear Creek is identifying the data problems impacting their organization, what those problems are costing, and their early efforts to address data quality.
Bear Creek Corporation is one of the nation's premier direct marketing and e-commerce companies. Their brands, Harry and David and Jackson & Perkins, have become a part of American life: through their catalogs, retail stores and the Internet, Americans have trusted Bear Creek's brands and products for decades.

ETL Evaluation Criteria is Available for Download

I posted a document containing the list of ETL product eval criteria mentioned in my last DM Review article. You can download a Microsoft Word version now. I'll post an update when the PDF version is available.

When reading through the list you will find that there are redundant criteria in different evaluation categories. If you want to use this list, first determine what areas are important in your evaluation, then take the criteria from the relevant categories. If you try to use this list as-is, you'll spend a lot of time finding answers to redundant questions. It's also a lot of work writing out and ranking criteria, so you are better off with the smallest number possible to get your evaluation done.

My Latest Article on ETL Eval Criteria is Available

My latest article, "Criteria for ETL Product Selection", is now out in the DM Direct newsletter. This is a loosely related followon to the article in this month's issue of DM Review, which I posted a few entries earlier.

The intro:

To make tool selection easier, it is best to develop criteria in categories of functionality. This will make it easier to compare the tools in different areas and is more effective than trying to come up with a single number or score to indicate that one product is best. All ETL tools have their strong and weak points. The goal of an evaluation should be to identify those strengths and weaknesses and match them up to what is important to your organization.

It doesn't make much sense to rate products based on features you may never use, so look at the things that are important to you and ignore the rest. Fewer criteria will also help to speed up your evaluation. The remainder of this article discusses the basic categories you might want to use to develop your detailed evaluation criteria.
I will be posting a PDF of a criteria list I used to maintain shortly. Check back here in a week.

Is Ab Initio Worth Evaluating?

Ab Initio has been around for long enough to make a name for themselves in the ETL space. Their marketing approach appears to be one of mystique: maintain secrecy around the product while allowing some information out about the high-end customers, creating interest because of the tantalizing tidbits they provide. Their web site is more typical of an advertising firm with nothing meaningful to say than a technology company.

I've seen other software comparnies do this and it works up to a point. Once the company has been around long enough, the approach stops working so well. Enough people have used the product that it becomes easier to find out the good and bad points about the product. Ab Initio uses non-disclosure agreements to try and stifle public discussion as much as possible, maintaining secrecy.

Another reason the secretive approach becomes less effective is that it doesn't scale. There's a point at which the market is aware of the company and interested, but the secrecy limits further exposure. It also limits the market supporters (analysts and consulting firms) from working with the company. If I have to sign an NDA just to see a presentation about the product, and there are seveal other companies clamoring for attention, I probably won't bother. As an analyst or consultant, what good is seeing the product and company if you can't talk about it?

In Ab Initio's case, the market perception is of a high-performance ETL product for large data volumes, similar to Torrent (acquired by Ascential). With both Ascential and Informatica releasing product versions that can scale to large data volumes across a workgroup of servers, the question is one of relative performance and cost. In Ab Initio's case this is hard to judge, because they won't say how much it costs, and the only word is that it is high-cost relative to other vendors.

The lack of information about product features also means that it is difficult to see what you trade in ETL features for that performance, and whether the performance is really that much better than the other vendors. Smart when you are small and can't afford a marketing budget. Maybe not so smart when you have more market exposure. Eventually people begin to think that maybe you are simply protecting margins and the product is ok, but not so much better than the competition that it rates the exorbitant cost.

In a discussion forum I read a consultant's story of trying to get training on the product. Ab Initio would not provide training because they only allow training for companies that already own the product (exception: Knightsbridge, from what I have learned). That person was unable to work on the data warehouse project because they wouldn't allow him to get training, so they've created a critic.

I believe Ab Initio is reaching the point of maximum returns, and it is starting to hurt them. I was asked if I would be adding them to my ETL evaluation course, and the answe is "no" because I can't get enough information to make it useful, at least not without more work than it's worth. My talks with other analysts have turned up the same thing. In some cases there is outright hostility, and both analysts and consultants are telling their customers that they should not consider the company because they are hard to work with and only provide content-free marketing unless you go through a ridiculous NDA process.

That's why Ab Initio is not included in any evaluations I've been doing. I can only offer an opinion based on what I've seen, which is a typical ETL tool, not as easy to develop with as many others, but with an apparent edge in performance at the very high end. That leads me to conclude that if you aren't trying to process a billion rows a day, it's probably not going to be worth the trouble or expense of including them.


Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits.

Subscribe to XML feed

Bio / About Me

Check out my book

Clickstream data warehousing book cover Buy clickstream data warehousing from

Search this site or  the web

Site search   Web search
powered by FreeFind
Popular Posts
Primate programming.
Why development in crunch mode doesn't work.
Enterprise data modeling sucks big rocks.
XP Exaggerated.
Ping-pong in the matrix.
Time management for anarchists.
Is Ab Initio worth evaluating?
Job posting: omniscient architect.
Why hiring more sales people won't grow revenues faster.
Some resources for Open Source CMS.

Reading List
The Cruise of the Snark
Blue Latitudes
Everyone in Silico
The Klamath Knot
Swarm Intelligence (Bonabeau)
A three year backlog of F&SF

Listening List
Toots and the Maytals
The Buena Vista Social Club
American Idiot

Watching List
Winged Migration Quicktime trailer
Ghengis Blues
Howl's Moving Castls
A Bronx Tale

Daily KOS
Due Diligence
Boing Boing
Kevin Kelly (Recomendo)
Not Geniuses
3 Quarks Daily

War in Context
Valmiki's Ramayana
Choose the Blue
Third Nature
Mark Madsen
The Data Warehouse Institute
James Howard Kunstler
Clickstream Data Warehousing
Technorati Profile

04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013

Powered by Blogger.

Creative Commons License
This work is licensed under this Creative Commons License except where indicated.