My Wired for March 2010 arrived today (things take a while to reach the antipodes), with the most interesting article being on the Google algorithm. And hold on, this isn’t a Google-bashing blog entry.
Steven Levy’s article was probably written before the furore over the Google Buzz privacy flap. And it points out how Google has learned from users for search, producing more relevant results than its competitors. With 65 per cent of the search market (and close to 100 per cent of my searches for many years), it has a bigger pool to learn from, too.
Recently I have noticed in ego-searches that Google is now smart enough to distinguish between searches for yours truly and those for Jack Yan & Associates (both in quotes), so that the former results in a mere 53,800 references, and the latter with 124,000 (quite a bit down from yesterday, when I first hatched the idea about blogging this topic). That is smart in itself: knowing when people are looking for me (or my blog) and when they seek the company. By comparison, Yahoo! lists 280,000 for the former and 42,500 for the latter, as the latter is (if you look at terms alone) a more specific search.
Once upon a time—even as late as 2009—a search for my name would result in both my personal and work sites.
I’m pretty proud of my company and the people who work with me, and in election year, if someone were checking out my background, I sure would not mind them getting to JY&A as well. On the other hand, thanks to this distinction, my mayoral campaign site comes up in the top 10 in a search for my name. Either way, it’s relevant to a searcher—so all is well.
But is this really how people search? If I were searching for, say, Heidi Klum, I would probably want (I write this before I even attempt a search) her bio, a bit of news, pictures to ogle, and Heidi Klum GmbH, her company. This is exactly what Google delivers, with her Wikipedia entry in addition (as the first result). (Bing does this, too; Yahoo! puts Heidi Klum GmbH at number one.) Maybe someone could get back to me on their expectations for a name search although, as I said, Google is doing me a huge political favour by distinguishing me from my business. The ability to distinguish the two is, by all accounts, clever.
Levy cites an example in his article about mike siwek lawyer mi which, when fed into Google at the time of his writing, gets a page about a Michigan lawyer called Mike Siwek. On Bing, ‘the first result is a page about the NFL draft that includes safety Lawyer Milloy. Several pages into the results, there’s no direct referral to Siwek.’ (A Bing search today still does not have Mr Siwek appear early on; in fact, most now discuss Levy’s article; sadly for Mr Siwek, the same now applies on Google, with the first actual reference to his name being the 18th result. Cuil, incidentally, returns nothing—so much for supposedly having a Google-busting index size.)
But I have one that is puzzling to me. Ten years ago, Lucire published an article about the 10th anniversary of the Elle Macpherson Intimates range. One would think that the query “Elle Macpherson Intimates” “10th anniversary” would bring this up first—in fact, I did have to search for the URL last year when writing a blog post. On Google, this is, in fact, the last entry. On Bing, it is the first. On Yahoo!, it is second.
Of course, Google may well have judged the Lucire article to be too old and that the overwhelming majority of searches is for current or recent information. And being 10 years old, I hardly imagine there to be too many links to it any more. However, I thought the fact that we can now, very easily, sort our searches by date—especially with the new layout of the results’ page—it might just give us the most precise result. The lead page to the article is in frames (yes, it’s that old), which may have been penalized by Google. But many of the leading results that turn up that have these two terms do not have them with great proximity (in fact, numbers one and two do not even have the term Elle Macpherson Intimates any more). However, I don’t think the page I hunted for should be last, especially as none of the preceding entries even have the words in their title.
I am not complaining about the Google situation since a 2009 Lucire article that links to the old Elle Macpherson one comes up in the top 10, so it’s still reasonably easy to get to via the top search engine. (Cuil lists the 2009 article from Lucire in its top 10, too.) There’s also a blog entry from me that links it, and that appears on the second page.
It’s just that I hold a belief that many people who search using Google (or any search engine) do so for research. They want to know about Brand X and, sometimes, about its history. If I type a person’s name, there is a fairly good chance I want to know the latest. But when I qualify that name with something that puts it in the past (anniversary), then I’d say I want something historical. That includes old pages.
While few rely on a fashion magazine for historical research (though, believe me, we get queries from scholars who want citations of things they saw in Lucire), Google results nos. 1 through 53 and the majority of Cuil’s results (which are very irrelevant—the first two are of a domain that no longer exists and a blank page) don’t hit the spot.
For the overwhelming majority of searches—well over 90 per cent—Google serves me just fine, which is why you don’t see me complain much about the quality of its results. Even here, it’s not so much a complaint, but professional curiosity. It would be sad for Bing or Yahoo! to be labelled as search engines for historical searches, but someone should fairly provide access to the older, yet still relevant, pages on the internet for everyday queries (so I don’t mean the Internet Archive).
PS.: There’s one more search engine that should be considered. Gigablast, which I have used on and off over the years, does not list the 2000 article, either. Like Google, the 2009 one is listed, and only five results are returned.—JY
One thought on “It’s hard finding the old stuff on Google”