style="margin-top:70px;" Clickstream

Clickstream

     
Increasing Data Integration Exposes X-rated Data Quality Problems

As more and more systems are moved from batch to online designs, or replaced entirely by product suites and ERP systems, the problem of dirty data is becoming more apparent. In the past this was rarely exposed to the outside. Now that systems are so interconnected, the polluted data multiplies quickly. My last experience with this was when Clickstream Data Warehousing came out.

At that time, the information that shows up on the Amazon and Barnes & Noble web pages for the book had errors in it. We got the errors corrected quickly enough, but they reappeared a week later. It turns out the publisher made mistakes entering the information, and that data was propagated from their system to the distributors' systems, and from there to several online booksellers. Any correction downstream was overwritten whenever new updates came in. Even fixing it at the distributor did not guarantee it wouldn't reoccur.

This kind of problem happens all the time with product information, and in many different industries. One of my favorite examples of propagating bogus product data is also from Amazon. The problem: the image references for a bunch of Shaolin Kung-fu movies were not for the video covers, but for softcore porn videos. A query for, say "Fighting of the Shaolin Monks", turned up this page a as result [screen capture, 366K]. Here's a much smaller closeup of the video box for Fighting of the Shaolin Monks. This is a style of Kung-fu I'm not familiar with.

Studies in the direct mail industry show industry say the industry-wide costs of erroneous mailing caused by bad data run into the hundreds of millions of dollars. Not too long ago the state of Alaska had a single problem with a mailing that ended up costing them somewhere between $65,000 and $95,000 to correct.

Even though bad data can lead to commercial loss (or possibly unexpected sales growth to the teenage male segment in the movie example), most companies don't have a data quality program in place. This is not necessarily bad, since data quality is only as good as the processes that handle it. Putting a program and people in place to correct data quality problems is not normally one person's job. It's a process problem that requires changing software development, systems integration, and data management practices.

Without this perspective, it's easy to fall into the trap of buying data cleansing tools to clean up data so that's it's in good shape, never fixing the source of the problems. This is a lot like building a filtration plant for your water supply instead of stopping the upstream pollution. It costs more and it deals with the symptom rather than the cause.

Comments: Post a Comment

Home

Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits.


Subscribe to XML feed


Bio / About Me


Check out my book

Clickstream data warehousing book cover Buy clickstream data warehousing from Amazon.com

Search this site or  the web



Site search   Web search
powered by FreeFind
Popular Posts
Primate programming.
Why development in crunch mode doesn't work.
Enterprise data modeling sucks big rocks.
XP Exaggerated.
Ping-pong in the matrix.
Time management for anarchists.
Is Ab Initio worth evaluating?
Job posting: omniscient architect.
Why hiring more sales people won't grow revenues faster.
Some resources for Open Source CMS.

Reading List
Quicksilver
The Cruise of the Snark
Blue Latitudes
Everyone in Silico
The Klamath Knot
Swarm Intelligence (Bonabeau)
A three year backlog of F&SF

Listening List
Toots and the Maytals
The Buena Vista Social Club
American Idiot

Watching List
Winged Migration Quicktime trailer
Ghengis Blues
Howl's Moving Castls
Hero
A Bronx Tale

Blogroll
Daily KOS
Due Diligence
Boing Boing
Kevin Kelly (Recomendo)
Not Geniuses
3 Quarks Daily
Futurismic
Fafblog
Kottke.org

Miscellany
War in Context
Salon.com
Valmiki's Ramayana
Choose the Blue
Third Nature
Mark Madsen
The Data Warehouse Institute
James Howard Kunstler
WorldChanging
/.
Clickstream Data Warehousing
Technorati Profile

Archives
04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013


Powered by Blogger.

Creative Commons License
This work is licensed under this Creative Commons License except where indicated.