Unexpected Lessons From the Mashup Camp 4 Contest
Mashup Camp is done and we came away with 4 of the prizes from the IBM (and Dapper, Kapow, StrikeIron, Accuweather) mashup building competition. Not a clean sweep but we did our best.
While sourcing information for the mashups, one thing really surprised me. Yahoo! generates really poor HTML that's hard to parse. We had identified data for which there were no feeds in several areas (like Yahoo! finance) and built scrapers. We couldn't get any of them to work (I tried from five different sets of pages). We ended up going to several other sites like Google finance to get our data. Googles sites were simple to scrape, with no strange things going on.
Lesson learned: if you want to use Yahoo! go via one of their APIs or RSS feeds, otherwise find another source because you'll be pulling your hair out.
A related lesson is just how web-unfriendly Microsoft can be. I hadn't realized the difficulty of going after ASP-generated pages. If you use url-driven tools you are simply SOL. If you use a tool that can do directed spidering it's still not easy but works.
I'll be ready for the next contest. IBM gave two months of advanced notice, but neither we nor the #2/3 winner took advantage of the time. We didn't start until the day before the deadline, when the wine and beer provided by IBM encouraged us to help each other learn the tools to do the work.
Labels: mashup, mashup camp, mashupcamp4, web2.0
Posted by Mark Saturday, July 21, 2007 9:09:00 PM |