search engines

Cuil, Titanic search engine, sinks day one

Posted by | search engines | No Comments

learning-5.png

Cuil, yet another search engine, launched this morning – and then sank this afternoon.

Normally it is disingenuous to criticize people who suffer because of their success (i.e. servers going down from launch overload), but only when they innovate. Innovation means you have created something new, rather than copying, and that there will be unpredictable things that can only be fixed through trial and error.

Because Cuil is an attempt to build a better copy of a search engine, it should be judged on how well it does that, not on the novelty of the idea. People liked Google because it was fast, accurate and reliable, not because they had never seen a search engine.

Cuil is down, meaning that at this moment it is infinitely slow, unreliable and inaccurate. It has been backed with $33 million by people who have put greed before imagination, thinking that a Google beater = dollars, that a Google beater is a search engine and that people who have worked for Google are automatically better. This is the mentality that would back an MBA middle-management type over a visionary entrepreneur, a John Sculley over a Steve Jobs.

Google is threatening Microsoft’s access to dollars, not with desktop apps. or a desktop OS, but by owning an innovative product that became the desktop of the web – search. To get some of Google’s dollars requires developing something new and different that becomes the starting point for people on the web, Facebook is possibly something like that, Cuil is not.

Cuil say that they are pronounced cool. If you have to tell people you are, you’re probably not.

Om has the skinny.

Is Google Beatable by Re-inventing Search?

Posted by | search engines | No Comments

Powerset have a demo out and its interesting, technically proficient and built by a solid team, but winning requires questioning the premise: is better search a problem and is it solved by changing the way people are currently used to searching to the the way people naturally speak?

Google is a long term threat to Microsoft’s hegemony not by having built a better OS, but by owning Search. The web shifted the landscape of technology and a a once niche application, dominated by companies like Verity: full text search, became the ‘command line of the web’. Since Microsoft had always owned the command line, this made web search a strategic threat.

Powerset has some very bright people like Barney Pell behind it, and who am I to challenge it, but I have a nagging doubt, which is to do with my years spent in architecture rather than technology. In architecture the first thing you do is question the brief: if someone asks you for a building with a sloping facade, you ask why and you may have a good reason for doing something differently. If someone asks you for a better search engine, you would ask why. Here is my asking why.

If the value in building a better search engine is to beat Google, perhaps Google can only be beaten when something other than a search engine becomes a starting point for the web. It doesn’t take a stretch of the imagination to see that if Facebook became a truly monopolistic social network it would be a strategic threat to Google. If building a better search engine is the way to beat Google then Powerset is on the right track.

Is the way to build a better search engine based on the ability to answer questions the way they are spoken? If so, then natural language technology is the right approach and Powerset is on the right track. A few years ago this would definitely be the case, but these days, the ergonomics of the web have evolved in tandem with Google. People don’t tend to type question into search engines, but type a few salient words. This may not be the most elegant practice, but it is the de facto standard behavior and to try and change it might be like trying to change the QWERTY keyboard for a more rational one.

Assuming that there is a better search practice than currently used, how does Powerset stack up when natural language queries are typed into it. This would require very thorough testing, but I’ll give on example: ‘who was churchills father’ [sic]. Both sites return the correct answer, but Powerset requires adding an apostrophe: churchill’s, not a big deal for them to fix but a perfect example of how a simple grammatical rule dealt with by query parsing can sometimes get forgotten in the attempt to index perfectly.

Lastly, intelligent indexing comes at a cost – it may be slower to query, and it is definitely slower to index. Quick response time has always been a priority for search – and Powerset can possibly match. But the biggest change on in search in this second phase of the web, has been the rise of ubiquitous, news style (e.g. weblog) publishing systems and the importance of search by date. AltaVista’s last throw of PR success against Google was their news search which was pounded after 911, before Google News, let alone weblog search existed. Fast updates require fast indexing.

I wish Powerset every success, and think that this will come when something else is thrown into their mix.

In defense of Technorati

Posted by | search engines | No Comments

After 911 AltaVista scored some rare Brownie Points against Google, in the press, because Google didn’t have news search, but AV did, via Moreover. Google News was built largely as a result of 911. It proved that Moreover was a news search engine, but it was too late. Our PR company had told us that ‘search is dead’ and without a revenue model for search engines, there was pressure from all sides to make Moreover something else – which resulted in all sorts of convoluted bullshit and meant that Moreover never had decent technology for full-text search. Eppur Si Muove. Technorati is in the same boat, there is probably a great deal of pressure not to call it a blog search engine, and perhaps for a different reason than Moreover – that Google is too difficult to take head on. I may sound arrogant, but I cant help feeling that…

Read More

At last – Dapper

Posted by | search engines | No Comments

Dapper fills a perfect niche. People forget that before RSS there was screenscraping. And that after RSS there is still screenscraping. Most of Google News is scraped and does not come from RSS. Amazingly, because nobody really puts any useful metadata in RSS, you still need to screenscrape to produce useful aggregation services. Other than enterprise companies such as WebMethods which had a scraping tool as part of a web services builder, or the innovative Junglee that was snapped up by Amazon before the last .com boom got underway, nobody has built an online screen scraping tool, despite the fact that its actually a massive gaping hole in fundamental services of the web. At Moreover.com, RSS was largely useless to us, because you can’t build a news search engine without full text, and the bigger news sources don’t want to output full text RSS, without prior negotiation. So, like Google…

Read More

Google’s Gmail adds Map This links to addresses mentioned within emails.

Posted by | search engines | No Comments

I just noticed that Google add automatic Map Links when something that looks like an address appears within a message in Gmail. This kind of on-the-fly detection of metadata to create searches could be used for auto-dialing phone numbers or adding appointments to a calendar – but I guess we’ll have to wait for a Google Calendar product for that. “Gmail makes it easy for you to keep track of your packages, and map out directions to your destinations; when you open a message that lists an address or package tracking number, Gmail shows you handy links to maps and directions, or your package’s delivery status.”

Read More

Adwords, Adsense now Adballoons – Google is stealth testing Yellow Pages killer, ad network for maps

Posted by | search engines | No Comments

Although unannounced publicly, Google appears to be testing its Yellow Pages killer, maps based advertising. If you do a search for Hotels in New York on Google Local, you get something that you don’t get for a search for ‘hotels in San Francisco’ – ads. Right there as little blue map balloons rather the red, algorithmic, local search results. Not only are the ads local, but they are contextual i.e. hotel searches bring up sponsored results for local hotels. In some ways this is a relatively obvious move, however its big news considering that: 1. The Yellow Pages advertising market is bigger than the entire existing online search advertising market. 2. Offline Yellow Pages directories will clearly be replaced, over time, by online products, and it looks like maps are how this plays out. 3. Ad products are where Google makes the money that justifies its gargantuan Market Cap. so…

Read More

why is weblog search so hard?

Posted by | search engines | No Comments

Buried within the comments of Jermey Zawadny’s post about Feedster is this comment: “I don’t recall Feedster ever being all that useful. But I also don’t find Technorati particularly useful. Why can’t someone just create a simple search engine for feeds/blogs?” The truth is that it is very difficult to build a search engine with real-time updates, since search engines are optimized for retrieval and usually use batch indexing. In addition, the majority of weblogs are spam, further compounding the problem. Blog search, which may once have seemed niche, will eventually be a standard part of search engines. At the moment, nobody, including Google, have a weblog search product that works. If they did it would be very useful. The real reason this is important is that it has nothing to do with weblogs, long term. There are only two things that matter in search – freshness and relevancy. At…

Read More

What the Moreover, Weblogs.com, Verisign deal means.

Posted by | search engines | No Comments

This is my personal opinion and does not reflect any company policy. Most web content is published and then indexed when a search engine finds it, taking up to 30 days. In the past submitting your site to a search engine was the done thing – now its coming back, only better. Search engines have completely different indexes for news and weblog search, because the indexes need to be updated more quickly, to be able to do this they cannot search the entire web every few minutes but need to be alerted – or pinged. Currently, ‘pings’ to sites like weblogs.com or ping-o-matic or blo.gs say that SOMETHING has been updated on a weblog or news site. Specs such as RSSPing change this to a ping that says WHAT has been updated. If all pages being published on the web did this (and there is no technical reason why they…

Read More

Is Yahoo more Web 2.0 than Google?

Posted by | search engines | No Comments

Whatever Web 2.0 really is, and in some ways its an empty ‘container meme’ for a meme that will morph into whatever is most convenient and successful, Yahoo are looking pretty well equiped to give Google a run for their money in the more media centric worlds of social applications and publishing. When did you last use Orkut? When did you last use Flickr? With a media savvy exec. team and some small but smart acquisitions: Oddpost; Flickr and now Upcoming, Yahoo have the people, the components and the technical approach to create a synergy of social applications with next generation UI. It used to be that using online apps. was a trade-off of functionality and performance vs not having to worry about maintenance, upgrades or backups or ability to move from one machine to another. With Gmail or Oddpost, there is no trade-off, my desktop email client crashed when…

Read More

Why the battle between Google, Microsoft and Yahoo will involve maps.

Posted by | search engines | No Comments

Number of households in US: 101 million. Local Services market value: $600 billion annual. Household services: $180 billion annual. Amount spent on local offline advertising by contracting and real estate businesses: $25 billion annual. Dotcom investment in 10 online services during boom: $250 million [they were keen but too early] Amount spent on advertising local services to households: anywhere between $50 -$90 billion annual [this is the biggest untapped revenue opportunity for search] Largest category of services posting on Craigslist (taken by looking at a sample 2 days of postings): Sex services, 40% [i.e. Craiglist not a player yet, outside of jobs and real estate] Largest category of Yellow pages advertiser: Attorneys, $856 million in 2001. Largest single event resulting in Yellow Pages use: eldest daughter gets married [personalized search and user profiles will be important] Number of Overture searches that explicitly have a city in the search: 4% Number…

Read More