style="margin-top:70px;" Clickstream

Clickstream

     
Web Data Sourcing Tools We Used for the Mashup Contest

We just finished recording a podcast for IBM DeveloperWorks which will be up in the next few days so I was looking to see what else happened at Mashup Camp while we were writing code. We got a mention at Programmable Web (first place I go to see what's new in the mashup space). I wish we had been able to spend more time in sessions, but being heads-down in new tools was still worthwhile.

Apart from the possibility of winning prizes, this is the best learning environment I've found for this space. Unless you spend a lot of time reading through blogs, you aren't going to find many resources. Besides, nothing beats learning from other people while doing.

Here's the rundown of tools Renat and I worked with:
QEDwiki (not an official product or downloadable, yet)
Apatar
Dapper
Kapow
Yahoo! Pipes
and lots of Google sources

One thing that hasn't been mentioned in most of the news is that all these companies had people at Camp. To be honest, if it weren't for Dan Gisolfi and Meg Sorber from IBM, we never would have finished work by the time the event closed. They stayed up late to help us with problems, bugs and techniques using QEDwiki.

I used Dapper a lot to scrape pages and make RSS feeds. I ran into problems with Dapper and the bad HTML practices of some web sites. Fortunately, Eran Shir and Jon Aizen (the CEO and CTO of Dapper) were there to help out.

Out of all the things I've worked with, theirs is the most impressive because of its simplicity. Unlike Pipes, which manipulates RSS feeds, Dapper scrapes pages and turns them into feeds in many different formats. Dapper + Pipes is a great combination. Kapow is an industrial strength scraper, so Dapper is not as powerful as Kapow's tools, but it's a lot easier to use for something quick and dirty.

We used Apatar because it was the only way short of directly coding to APIs to get data from Salesforce.com. And it's open source. And Renat knows how to use it. It's not a page scraper, it's a data integration tool, so it does things these other tools can't. Overall, the combination of a DI tool, a scraper, and a manipulation and delivery formatting tool are what you need to get data for mashups if you're doing it inside an IT shop.

QEDwiki is the assembly hub, so it doesn't provide data sourcing or manipulation features. IBM is going to include another tool in the kit for that. They did a demo of this during Mashup U, but didn't have it available for us.

Labels: , , , , , , ,


Comments: Post a Comment

Home

Data warehousing, business intelligence, IT strategy and architecture, and occasional interesting bits.


Subscribe to XML feed


Bio / About Me


Check out my book

Clickstream data warehousing book cover Buy clickstream data warehousing from Amazon.com

Search this site or  the web



Site search   Web search
powered by FreeFind
Popular Posts
Primate programming.
Why development in crunch mode doesn't work.
Enterprise data modeling sucks big rocks.
XP Exaggerated.
Ping-pong in the matrix.
Time management for anarchists.
Is Ab Initio worth evaluating?
Job posting: omniscient architect.
Why hiring more sales people won't grow revenues faster.
Some resources for Open Source CMS.

Reading List
Quicksilver
The Cruise of the Snark
Blue Latitudes
Everyone in Silico
The Klamath Knot
Swarm Intelligence (Bonabeau)
A three year backlog of F&SF

Listening List
Toots and the Maytals
The Buena Vista Social Club
American Idiot

Watching List
Winged Migration Quicktime trailer
Ghengis Blues
Howl's Moving Castls
Hero
A Bronx Tale

Blogroll
Daily KOS
Due Diligence
Boing Boing
Kevin Kelly (Recomendo)
Not Geniuses
3 Quarks Daily
Futurismic
Fafblog
Kottke.org

Miscellany
War in Context
Salon.com
Valmiki's Ramayana
Choose the Blue
Third Nature
Mark Madsen
The Data Warehouse Institute
James Howard Kunstler
WorldChanging
/.
Clickstream Data Warehousing
Technorati Profile

Archives
04/01/2003 - 05/01/2003 05/01/2003 - 06/01/2003 06/01/2003 - 07/01/2003 07/01/2003 - 08/01/2003 08/01/2003 - 09/01/2003 09/01/2003 - 10/01/2003 10/01/2003 - 11/01/2003 11/01/2003 - 12/01/2003 12/01/2003 - 01/01/2004 05/01/2004 - 06/01/2004 06/01/2004 - 07/01/2004 07/01/2004 - 08/01/2004 08/01/2004 - 09/01/2004 09/01/2004 - 10/01/2004 10/01/2004 - 11/01/2004 11/01/2004 - 12/01/2004 12/01/2004 - 01/01/2005 01/01/2005 - 02/01/2005 02/01/2005 - 03/01/2005 03/01/2005 - 04/01/2005 05/01/2005 - 06/01/2005 06/01/2005 - 07/01/2005 07/01/2005 - 08/01/2005 08/01/2005 - 09/01/2005 09/01/2005 - 10/01/2005 10/01/2005 - 11/01/2005 11/01/2005 - 12/01/2005 12/01/2005 - 01/01/2006 01/01/2006 - 02/01/2006 03/01/2006 - 04/01/2006 05/01/2006 - 06/01/2006 06/01/2006 - 07/01/2006 07/01/2006 - 08/01/2006 08/01/2006 - 09/01/2006 09/01/2006 - 10/01/2006 10/01/2006 - 11/01/2006 01/01/2007 - 02/01/2007 02/01/2007 - 03/01/2007 03/01/2007 - 04/01/2007 04/01/2007 - 05/01/2007 05/01/2007 - 06/01/2007 06/01/2007 - 07/01/2007 07/01/2007 - 08/01/2007 08/01/2007 - 09/01/2007 09/01/2007 - 10/01/2007 10/01/2007 - 11/01/2007 11/01/2007 - 12/01/2007 12/01/2007 - 01/01/2008 01/01/2008 - 02/01/2008 02/01/2008 - 03/01/2008 03/01/2008 - 04/01/2008 08/01/2008 - 09/01/2008 06/01/2009 - 07/01/2009 08/01/2009 - 09/01/2009 10/01/2009 - 11/01/2009 01/01/2010 - 02/01/2010 09/01/2011 - 10/01/2011 04/01/2013 - 05/01/2013


Powered by Blogger.

Creative Commons License
This work is licensed under this Creative Commons License except where indicated.