Om sez: Google just
launched a blog/RSS search service. The
Google Blog Search
FAQ indicates that they're indexing the actual contents
of the feeds, and not the HTML of the blogs. This means if you're publishing truncated posts in your feed, Google Blog
Search won't index the full text of your blog. They started indexing feeds in June of 2005, so posts older than that
also won't be included — though they say they're working on a way to include those older entries. They're grabbing
blogs from Weblogs.com, so if you don't ping that service you'll either have to start, or wait until they provide a
method of manually submitting your blog. The language feature is pretty significant, here — by default you're getting
results from all languages, and you can itemize by 35 specific languages in the Advanced Search options.
Promisingly, they're offering subscription to search results via RSS. Frighteningly, there's the spectre of spam. Some
of the results appearing in the "Related blogs" listing at the top already seem a little suspect. Is
this blog a related blog for a
search on web2.0
because it's highly relevant, or because it has Web2.0 in the title? Looks like the latter, because an
ego search lists my
IMSmarter blog as being highly relevant when in fact, it's probably
the least relevant of all 8,465 blogs I post to — but it happens to have my name in the title (much as I love
IMSmarter, I only used the blog portion to test it out and now
whatever is there is pretty darn stale). Ultimately, the fact that I don't and can't know what makes items relevant
frustrates me — the results would be a lot more useful if I knew how they were determined. Spammers be damned — I want
transparent algorithms. Can the two co-exist?
Google blog search just launched
Reader Comments
(Page 1)2. I think if were going to get a really solid blogging search it will need to index more general information about the blog.
Instead of tagging the posts, or indexing a specific post from a feed, tag the whole blog with keywords describing the type of stuff that gets posted. The alogrithum then needs to generate results not just on the basis of a specific post but what the blogger posts about in general.
I also think such an engine should encourage more detailed querying. One word queries aren't enough, you are going to be disapointed with the results because the blogsphere turns so quickly.
Posted at 8:05PM on Dec 18th 2005 by danb
3. Pascal -- thank you for the clarification -- yes. :)
danb -- The "one word query" issue is big for me also. I think about this often when doing a Technorati tag search -- why can't I easily narrow down a search with another keyword? Maybe it would be helpful to move away from a "one search does all" model and move into an iterative search process, where you start with broader terms, check out the results, then start narrowing the set down even further.
Posted at 8:05PM on Dec 18th 2005 by barb dybwad









1. "They started indexing feeds in June of 2005, so posts older than that also won’t be included"
More precise: posts that were no longer on your rss feed before june 2005 (for most people with a _reasonable_ posting frequency that's a couple of months earlier ;-) )
Posted at 8:05PM on Dec 18th 2005 by Pascal Van Hecke