Clickstream

style="margin-top:70px;" Clickstream

Clickstream
	XP: Not for Data Warehouses I've always been skeptical of development methodologies. They ignore the fact that different organizations develop software under different constraints and in different ways. What's important is the development process, and that it address the needs of the organization while managing the process well enough to produce useful, reasonable-quality goods. Methodologies occasionally work if they are focused on specific problem domains, such as the data warehouse development methodology espoused in The Data Warehouse Lifecycle Toolkit. Methodologies should be like training wheels: you use them to learn the problem domain and ways to go about solving the problems in that domain, then you take the wheels off and evolve your process. More often, methodologies are top-down efforts to improve problems that are caused by ham-handed management. That said, I agreed to let a development manager try out XP on portions of the ETL development for a data warehouse project. The first problem was that the user was really me since the users specified the requirements we did the dimensional models for. That meant the user focus of XP was following the data mappings that populated dimension tables with the proper data. The result? Simple tasks took three times the duration required to develop in a more standard fashion. The total lack of documentation meant that when data anomalies popped up the developers had to read code to figure out how to deal with them. Some simple documentation could have specified that transactions X, Y and Z were handled, so that when transaction A showed up we knew something had been missed in the data feed. User stories were basically "get data from these places to those places, and do these things to make sure it's clean", not much different from the data mapping rules and flowcharts. The worst part is what should be XP's strength: testing. This was a joke because the expected results of the test require that you pull production data and process it to get the correct output. Without the ETL program to generate that output, you have to work out the results manually based on your understanding of the source data. That's fine when you eyeball the data and work out the desired results. But it does not account for data quality problems. A few bad values in a column can lead to a join failure, so that data is missing. But the test will never catch that. The developers couldn't develop every possible test case for full coverage testing because a simple dimension extract might pull 15 columns from 6 different tables, joining via 8 different columns, with hundreds of potential data values for each column. The combinatorial explosion of data values and relationships makes this extremely difficult, not to mention that many of the test cases should generate error conditions which are so unlikely in production data that it's hardly worth trying to catch them. Lastly, real production data changes over time so there's no way to guarantee that today's test cases cover tomorrow's production data. ETL programs don't have the luxury of controlling the inputs, only the outputs. The primary goal is winnowing bad data so that only the good gets through and the bad is flagged with a reason for rejection so it may be corrected at the source. The non-XP half of the development team had all the core scheduling, dependency checking and logging code and three dimensions done before the first dimension under the XP process saw the light of day. That one dimension passed all its tests, but it failed the first time it hit production data because of a data quality problem. We stopped using XP at this point, much to the relief of the developers. The kicker? These developers were all trained on location by Kent Beck and one of his associates for another project, but we picked them up while they were idle. There are some domains for which XP does not work, and systems integration - at least of the type done in data warehousing - appears to be one of them. Posted by Mark Tuesday, October 07, 2003 10:53:00 PM \| permalink \| Comments: I am deeply attracted by your post. It is really a nice and informative one. I will recommend it to my friends Seo training in Chennai # posted by Unknown : 3:11 AM Post a Comment Home	Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits. Subscribe to XML feed Bio / About Me Check out my book Where I'm At Find my status on Twitter Search this site or the web Site search Web search powered by FreeFind Popular Posts Primate programming. Why development in crunch mode doesn't work. Enterprise data modeling sucks big rocks. XP Exaggerated. Ping-pong in the matrix. Time management for anarchists. Is Ab Initio worth evaluating? Job posting: omniscient architect. Why hiring more sales people won't grow revenues faster. Some resources for Open Source CMS. Reading List Quicksilver The Cruise of the Snark Blue Latitudes Everyone in Silico The Klamath Knot Swarm Intelligence (Bonabeau) A three year backlog of F&SF Listening List Toots and the Maytals The Buena Vista Social Club American Idiot Watching List Winged Migration Quicktime trailer Ghengis Blues Howl's Moving Castls Hero A Bronx Tale Blogroll Daily KOS Due Diligence Boing Boing Kevin Kelly (Recomendo) Not Geniuses 3 Quarks Daily Futurismic Fafblog Kottke.org Miscellany War in Context Salon.com Valmiki's Ramayana Choose the Blue Third Nature Mark Madsen The Data Warehouse Institute James Howard Kunstler WorldChanging /. Clickstream Data Warehousing Technorati Profile Archives 04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013 This work is licensed under this Creative Commons License except where indicated.