Behold Echo

I’m not really interested in the politics of Echo, however, no matter what happens, a year or two from now we will have the way that publishing and aggregation works on the web nailed (most probably when Microsoft decide on what to adopt) – a development as important as the adoption of HTML and the web browser. This may include the Metaweblog API and RSS or a combined effort with a new XML schema in a SOAP wrapper.

To my mind there is no problem in making RSS as is the default payload for SOAP content. A few tweaks that Echo already has would allow typing – e.g. avoiding the current madness where the mime type of the full content is not specified.

So what are the missing bits?

On the detailed level: RSS content is so unnormalized as to be almost useless for commercial applications. To build a searchable index of RSS content you need access to the full text of stories – and commercial publications are not going to syndicate the full text of stories – but you don’t need to syndicate the full text of stories to index them. Encouraging the use of tokenized full text (i.e. remove stop words such as and, or, the etc.) allows for machines to index full articles but leaves humans to visit original publishers sites for the full article. This should be the default content of a ‘content’ tag and needs to be built into the default output from weblog publishing tools.

On the medium scale: because of arguments over the RSS core, not enough focus has been made on tools to create modules and allow extensibility. Forms need to be built into applications such as Userland’s, Blogger and Moveable Type’s to allow end user creation of RSS modules within a users namespace and without having to have users have any need to know about the underlying XML. Rapid adoption of modules will take syndicated content beyond the headline/link pair that is the only metadata currently being syndicated in any volume.

On the larger scale: content and the weblog API are two parts of the whole – most important of all perhaps is the ping server and related specs. In order to build personalized aggregators of real-time information, all of a weblog post needs to go to a neutral third party ping server and the ping server needs top have an API that allows clients to be alerted of changes in real time without having to scrape the ping server. Do this and you don’t have 15 minute old Google aggregated news but real time news – the stuff that people like Reuters know the value of.

Given the importance of the standards for publishing on the web, there needs to be a formal body with founder members such as Userland, Moveable Type and Blogger. There is no money to be made out of the standards themselves, but a great deal to be lost if they are not agreed on by everyone. Without a body lead by the weblog publishing tools their efforts will be userped by whatever the big co’s decide to use.

FrontPage – Sam Ruby’s Wiki