Clickstream

style="margin-top:70px;" Clickstream

Clickstream
	Unexpected Lessons From the Mashup Camp 4 Contest Mashup Camp is done and we came away with 4 of the prizes from the IBM (and Dapper, Kapow, StrikeIron, Accuweather) mashup building competition. Not a clean sweep but we did our best. While sourcing information for the mashups, one thing really surprised me. Yahoo! generates really poor HTML that's hard to parse. We had identified data for which there were no feeds in several areas (like Yahoo! finance) and built scrapers. We couldn't get any of them to work (I tried from five different sets of pages). We ended up going to several other sites like Google finance to get our data. Googles sites were simple to scrape, with no strange things going on. Lesson learned: if you want to use Yahoo! go via one of their APIs or RSS feeds, otherwise find another source because you'll be pulling your hair out. A related lesson is just how web-unfriendly Microsoft can be. I hadn't realized the difficulty of going after ASP-generated pages. If you use url-driven tools you are simply SOL. If you use a tool that can do directed spidering it's still not easy but works. I'll be ready for the next contest. IBM gave two months of advanced notice, but neither we nor the #2/3 winner took advantage of the time. We didn't start until the day before the deadline, when the wine and beer provided by IBM encouraged us to help each other learn the tools to do the work. Labels: mashup, mashup camp, mashupcamp4, web2.0 Posted by Mark Saturday, July 21, 2007 9:09:00 PM \| permalink \| Comments: Post a Comment Home	Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits. Subscribe to XML feed Bio / About Me Check out my book Where I'm At Find my status on Twitter Search this site or the web Site search Web search powered by FreeFind Popular Posts Primate programming. Why development in crunch mode doesn't work. Enterprise data modeling sucks big rocks. XP Exaggerated. Ping-pong in the matrix. Time management for anarchists. Is Ab Initio worth evaluating? Job posting: omniscient architect. Why hiring more sales people won't grow revenues faster. Some resources for Open Source CMS. Reading List Quicksilver The Cruise of the Snark Blue Latitudes Everyone in Silico The Klamath Knot Swarm Intelligence (Bonabeau) A three year backlog of F&SF Listening List Toots and the Maytals The Buena Vista Social Club American Idiot Watching List Winged Migration Quicktime trailer Ghengis Blues Howl's Moving Castls Hero A Bronx Tale Blogroll Daily KOS Due Diligence Boing Boing Kevin Kelly (Recomendo) Not Geniuses 3 Quarks Daily Futurismic Fafblog Kottke.org Miscellany War in Context Salon.com Valmiki's Ramayana Choose the Blue Third Nature Mark Madsen The Data Warehouse Institute James Howard Kunstler WorldChanging /. Clickstream Data Warehousing Technorati Profile Archives 04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013 This work is licensed under this Creative Commons License except where indicated.