Clickstream

style="margin-top:70px;" Clickstream

Clickstream
	Web Data Sourcing Tools We Used for the Mashup Contest We just finished recording a podcast for IBM DeveloperWorks which will be up in the next few days so I was looking to see what else happened at Mashup Camp while we were writing code. We got a mention at Programmable Web (first place I go to see what's new in the mashup space). I wish we had been able to spend more time in sessions, but being heads-down in new tools was still worthwhile. Apart from the possibility of winning prizes, this is the best learning environment I've found for this space. Unless you spend a lot of time reading through blogs, you aren't going to find many resources. Besides, nothing beats learning from other people while doing. Here's the rundown of tools Renat and I worked with: QEDwiki (not an official product or downloadable, yet) Apatar Dapper Kapow Yahoo! Pipes and lots of Google sources One thing that hasn't been mentioned in most of the news is that all these companies had people at Camp. To be honest, if it weren't for Dan Gisolfi and Meg Sorber from IBM, we never would have finished work by the time the event closed. They stayed up late to help us with problems, bugs and techniques using QEDwiki. I used Dapper a lot to scrape pages and make RSS feeds. I ran into problems with Dapper and the bad HTML practices of some web sites. Fortunately, Eran Shir and Jon Aizen (the CEO and CTO of Dapper) were there to help out. Out of all the things I've worked with, theirs is the most impressive because of its simplicity. Unlike Pipes, which manipulates RSS feeds, Dapper scrapes pages and turns them into feeds in many different formats. Dapper + Pipes is a great combination. Kapow is an industrial strength scraper, so Dapper is not as powerful as Kapow's tools, but it's a lot easier to use for something quick and dirty. We used Apatar because it was the only way short of directly coding to APIs to get data from Salesforce.com. And it's open source. And Renat knows how to use it. It's not a page scraper, it's a data integration tool, so it does things these other tools can't. Overall, the combination of a DI tool, a scraper, and a manipulation and delivery formatting tool are what you need to get data for mashups if you're doing it inside an IT shop. QEDwiki is the assembly hub, so it doesn't provide data sourcing or manipulation features. IBM is going to include another tool in the kit for that. They did a demo of this during Mashup U, but didn't have it available for us. Labels: apatar, dapper, ibm, kapow, mashup, mashup camp, pipes, qedwiki Posted by Mark Tuesday, July 24, 2007 9:02:00 AM \| permalink \| Comments: Post a Comment Home	Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits. Subscribe to XML feed Bio / About Me Check out my book Where I'm At Find my status on Twitter Search this site or the web Site search Web search powered by FreeFind Popular Posts Primate programming. Why development in crunch mode doesn't work. Enterprise data modeling sucks big rocks. XP Exaggerated. Ping-pong in the matrix. Time management for anarchists. Is Ab Initio worth evaluating? Job posting: omniscient architect. Why hiring more sales people won't grow revenues faster. Some resources for Open Source CMS. Reading List Quicksilver The Cruise of the Snark Blue Latitudes Everyone in Silico The Klamath Knot Swarm Intelligence (Bonabeau) A three year backlog of F&SF Listening List Toots and the Maytals The Buena Vista Social Club American Idiot Watching List Winged Migration Quicktime trailer Ghengis Blues Howl's Moving Castls Hero A Bronx Tale Blogroll Daily KOS Due Diligence Boing Boing Kevin Kelly (Recomendo) Not Geniuses 3 Quarks Daily Futurismic Fafblog Kottke.org Miscellany War in Context Salon.com Valmiki's Ramayana Choose the Blue Third Nature Mark Madsen The Data Warehouse Institute James Howard Kunstler WorldChanging /. Clickstream Data Warehousing Technorati Profile Archives 04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013 This work is licensed under this Creative Commons License except where indicated.