At last – Dapper | david galbraith

Dapper fills a perfect niche.

People forget that before RSS there was screenscraping. And that after RSS there is still screenscraping. Most of Google News is scraped and does not come from RSS.

Amazingly, because nobody really puts any useful metadata in RSS, you still need to screenscrape to produce useful aggregation services.

Other than enterprise companies such as WebMethods which had a scraping tool as part of a web services builder, or the innovative Junglee that was snapped up by Amazon before the last .com boom got underway, nobody has built an online screen scraping tool, despite the fact that its actually a massive gaping hole in fundamental services of the web.

At Moreover.com, RSS was largely useless to us, because you can’t build a news search engine without full text, and the bigger news sources don’t want to output full text RSS, without prior negotiation. So, like Google News, we were managing tens of thousands of scrapers, for search engines like MSN and Yahoo, – which is a pain in the ass.

Because this is a pain in the ass, Dapper is a damn good idea, but because people imagine that RSS is something its not, people may not realize.

If the right people get to using it, Dapper could become a prime mover in making RSS be what people think it is, allowing people to build good vertical search services such as real estate where you want to search by number of rooms etc.

Waiting around for people to create a real estate module for RSS may not be practical. It would be better to scrape and then make the module yourself, using Dapper.

For Dapper to succeed I’d guess that they need to focus on a community of content aggregators, rather than be purely a software service.

Dapper: The Data Mapper