Posts tagged ‘Bing’

Testing the search engines



I hadn’t heard of Blekko, a search engine, till last week, so armed with a new entrant, I wanted to see how they all compared.
   Blekko’s very pretty, and I’ve told Gabriel Weinberg, the man behind Duck Duck Go, just what it is that makes it attractive. Most of it is the modernist design approach it takes. But is it more functional?
   I have a couple of tests. You may have heard me dis Google’s supplemental index, where pages it deems to be less important wind up. But who makes that determination? And what if there is a page in there that is actually relevant but Google fails to dig it up?
   Google says the supplemental index doesn’t exist any more, but the fact remains that it fails to dig up some pages, especially older ones. So much for its comprehensive index.
   The first test, therefore, is one I have subjected every search engine I encounter to: will it find a 2000 article on Lucire about Elle Macpherson Intimates’ 10th anniversary? It is probably the only article on the subject, and because of this test, I’ve even linked it this year so it can be spidered by the search engines. Last month, Google could not find it, though in 2000–1, it was very easily found.
   If the search engines are as intelligent as their makers claim, it should be able to figure out these concepts and deliver the pages accordingly. The page itself is very basic with no trick HTML—just plain old meta data, as you would imagine for a ten-year-old file.
   Will the search engines find it now, with a few more inward links?

Duck Duck Go: 1st
Blekko: not found, though it locates a reference made on this blog and two others in Lucire, one going back to 2001, at positions 1, 2 and 12
Google: 73rd, with blog entries from here referring to it at 5 and 42, and another link in Lucire at 6
Bing: 1st with old frameset at 2nd
Ask: 7th

   Here’s the second test. In Wired, Google bragged about how its index could find a page about a certain lawyer in Michigan (mike siwek lawyer mi). Unfortunately for Mr Siwek, most of the top entries quickly became those about the Wired article and he was lost again in the index.
   Mr Don Wearing, a friend of mine, is a partner in a shoe retail chain. If I typed “Don Wearing” shoes, which of the search engines will deliver me an entry referring to Don Wearing specifically and not some guy called Don who happens to be wearing shoes? (Not long ago, the best the search engines could do was around 12th.)

Duck Duck Go: 2nd
Blekko: says ‘No results found for: “Don Wearing” shoes’ but actually finds the article at 5th
Google: 3rd
Bing: 2nd
Ask: 5th

Not bad: an improvement all round.
   OK, how about speed of addition? Let’s see if the search engines will find the last entry in this blog, added a few hours ago. I’ll use the search term “Jack Yan” TPPA.

Duck Duck Go: not found
Blekko: not found
Google: found the main blog page
Bing: found a link to it at MyBlogLog
Ask: not found, but came up with seven irrelevant results

   This is just a quick test based on three examples that might not reflect everyday use. However, the first two frustrated me earlier when I went to hunt for them on Google (and before I had heard of Duck Duck Go), which is why I remembered them, so admittedly Google was at a slight disadvantage in this test as a result. I never went to Bing or Ask regularly.
   Therefore, I’m not going to draw any conclusions about who is best, but I will say that Google is quicker at finding new material. I would, however, encourage others to give these other search engines a go and see how effective they are. I’m very happy with Duck Duck Go, especially as it does not second-guess my queries with Google’s annoying ‘Showing results for [what Google thinks I typed]. Search instead for [what I actually typed]’. No, Google, I did not type my query wrong—so give me the results already!
   I prefer Duck Duck Go’s approach, which is to treat the web more as a research medium. There is no hiding pages: it just delivers the most relevant result to what I typed, which is why I originally moved to Google at the end of the 1990s.
   Judging by the above, I’m not convinced Blekko is ready for prime-time (which is why it still has a beta tag).
   Of the five tested, it looks like it’s still the Duck for me, complemented with Google News. I’m way more impressed with Duck Duck Go’s privacy policy: no search leakage, no search history, and no collecting of personal information to hand over to law enforcement or, for that matter, the Chinese Politburo.
   And in a year where people have shown that they care about privacy, Duck Duck Go seems to make more sense.

Tags: , , , , , , , , , ,
Posted in business, design, internet, technology, USA | 2 Comments »

It’s hard finding the old stuff on Google


My Wired for March 2010 arrived today (things take a while to reach the antipodes), with the most interesting article being on the Google algorithm. And hold on, this isn’t a Google-bashing blog entry.
   Steven Levy’s article was probably written before the furore over the Google Buzz privacy flap. And it points out how Google has learned from users for search, producing more relevant results than its competitors. With 65 per cent of the search market (and close to 100 per cent of my searches for many years), it has a bigger pool to learn from, too.
   Recently I have noticed in ego-searches that Google is now smart enough to distinguish between searches for yours truly and those for Jack Yan & Associates (both in quotes), so that the former results in a mere 53,800 references, and the latter with 124,000 (quite a bit down from yesterday, when I first hatched the idea about blogging this topic). That is smart in itself: knowing when people are looking for me (or my blog) and when they seek the company. By comparison, Yahoo! lists 280,000 for the former and 42,500 for the latter, as the latter is (if you look at terms alone) a more specific search.
   Once upon a time—even as late as 2009—a search for my name would result in both my personal and work sites.
   I’m pretty proud of my company and the people who work with me, and in election year, if someone were checking out my background, I sure would not mind them getting to JY&A as well. On the other hand, thanks to this distinction, my mayoral campaign site comes up in the top 10 in a search for my name. Either way, it’s relevant to a searcher—so all is well.
   But is this really how people search? If I were searching for, say, Heidi Klum, I would probably want (I write this before I even attempt a search) her bio, a bit of news, pictures to ogle, and Heidi Klum GmbH, her company. This is exactly what Google delivers, with her Wikipedia entry in addition (as the first result). (Bing does this, too; Yahoo! puts Heidi Klum GmbH at number one.) Maybe someone could get back to me on their expectations for a name search although, as I said, Google is doing me a huge political favour by distinguishing me from my business. The ability to distinguish the two is, by all accounts, clever.
   Levy cites an example in his article about mike siwek lawyer mi which, when fed into Google at the time of his writing, gets a page about a Michigan lawyer called Mike Siwek. On Bing, ‘the first result is a page about the NFL draft that includes safety Lawyer Milloy. Several pages into the results, there’s no direct referral to Siwek.’ (A Bing search today still does not have Mr Siwek appear early on; in fact, most now discuss Levy’s article; sadly for Mr Siwek, the same now applies on Google, with the first actual reference to his name being the 18th result. Cuil, incidentally, returns nothing—so much for supposedly having a Google-busting index size.)
   But I have one that is puzzling to me. Ten years ago, Lucire published an article about the 10th anniversary of the Elle Macpherson Intimates range. One would think that the query “Elle Macpherson Intimates” “10th anniversary” would bring this up first—in fact, I did have to search for the URL last year when writing a blog post. On Google, this is, in fact, the last entry. On Bing, it is the first. On Yahoo!, it is second.
   Of course, Google may well have judged the Lucire article to be too old and that the overwhelming majority of searches is for current or recent information. And being 10 years old, I hardly imagine there to be too many links to it any more. However, I thought the fact that we can now, very easily, sort our searches by date—especially with the new layout of the results’ page—it might just give us the most precise result. The lead page to the article is in frames (yes, it’s that old), which may have been penalized by Google. But many of the leading results that turn up that have these two terms do not have them with great proximity (in fact, numbers one and two do not even have the term Elle Macpherson Intimates any more). However, I don’t think the page I hunted for should be last, especially as none of the preceding entries even have the words in their title.
   I am not complaining about the Google situation since a 2009 Lucire article that links to the old Elle Macpherson one comes up in the top 10, so it’s still reasonably easy to get to via the top search engine. (Cuil lists the 2009 article from Lucire in its top 10, too.) There’s also a blog entry from me that links it, and that appears on the second page.
   It’s just that I hold a belief that many people who search using Google (or any search engine) do so for research. They want to know about Brand X and, sometimes, about its history. If I type a person’s name, there is a fairly good chance I want to know the latest. But when I qualify that name with something that puts it in the past (anniversary), then I’d say I want something historical. That includes old pages.
   While few rely on a fashion magazine for historical research (though, believe me, we get queries from scholars who want citations of things they saw in Lucire), Google results nos. 1 through 53 and the majority of Cuil’s results (which are very irrelevant—the first two are of a domain that no longer exists and a blank page) don’t hit the spot.
   For the overwhelming majority of searches—well over 90 per cent—Google serves me just fine, which is why you don’t see me complain much about the quality of its results. Even here, it’s not so much a complaint, but professional curiosity. It would be sad for Bing or Yahoo! to be labelled as search engines for historical searches, but someone should fairly provide access to the older, yet still relevant, pages on the internet for everyday queries (so I don’t mean the Internet Archive).

PS.: There’s one more search engine that should be considered. Gigablast, which I have used on and off over the years, does not list the 2000 article, either. Like Google, the 2009 one is listed, and only five results are returned.—JY

Tags: , , , , , , , , , , , , , ,
Posted in internet, politics | 1 Comment »