Posts tagged ‘search engines’


It’s not just us: Google fares poorly for site: search for Quartz

27.03.2023

Not all of you will have caught the postscript to yesterday’s post. I wanted to see if Google was doing as bad a job with other Wordpress-only websites, and one of the most famous is Quartz.

Sure enough, it was. Of the top 50 for site:qz.com, 33 pages were author, tag or category pages (let’s just say indices for want of a better term). Only 17 were articles.

Quartz is properly famous with a big crew, so the fact Google can’t get a site: search right there, either, shows how bad things must be.

Here’s where the articles appear based on each 10-result page in Google:
 
01–10 ★
11–20 ★★
21–30 ★★★★★★★★
31–40 ★★
41–50 ★★★★
 
In other words, on the first page, there was one article (in fifth). On the second page, two. The third page, happily, there were eight, but the number drops again for the fourth and fifth.

It’s really not the behaviour you expect from a search engine, and as far as I know, till recently, Google operated normally.
 

 

How does Mojeek, whose spider and site operate normally, fare on this test? Better.
 
01–10 ★★★★★★★
11–20 ★★★★★★★★★★
21–30 ★★★★
31–40 ★
41–50
 

I’m not saying the massive number of author pages starting from page 3 is good, but at least Mojeek is putting them in later, which is what you’d expect. I’d personally prefer they be later still, or not show up at all, for this type of search.

Out of fun, let’s look at Bing:
 
01–06 ★★
07–16 ★★★★★★★★★
17–26 ★★★★★★★★★
27–36 ★★★★★★★★★
37–46 ★★★★★★★★★
47–56 ★★★★★★★★★
 
Yes, there were a lot of repeats (probably around 40 per cent again) and Bing oddly could only deliver six results on the first page, but those results are roughly what you’d expect: a lot of articles and some top-level pages on the first page. It even allowed me to go beyond 56, which is an achievement. Other than the repeated results, Bing delivered results that were closer to what was expected.
 
Earlier today, I discovered one setting in a Wordpress SEO plug-in that allowed tag pages to be indexed on Lucire’s website. That was never a problem till now, but I’ve turned it off. Sorry, Whangarei residents. I’ve asked Google not to make you the fashion capital of the world by having your tag appear first.

On this blog, tag pages were already selected for exclusion, but Google prefers unchanging, static HTML, and that’s another story.


You may also like

Tags: , , , , , , ,
Posted in business, internet, media, New Zealand, publishing, technology, typography, USA | No Comments »


Got dynamic pages or a WordPress blog? Don’t expect Google to rank your pages highly

25.03.2023

That was short-lived. Bing’s back to offering 55 results for Lucire, and when you go through them, c. 40 per cent are repeated from page to page. However, a lot of the results are from the 2020s now, of both static and dynamic pages, so that’s something. There’s still a handful of truly ancient pages that haven’t been linked for decades, too.

This blog’s views are down dramatically, though as I haven’t fed in site:jackyan.com often into any search engine, it’s hard to say what the cause is. However, it’s more likely than not that Google has caused this, because this very search nets only results for static pages until page 7 except for the home page of this blog.

I note from a search of this blog that the first time site:jackyan.com is recorded was in July 2022, and I had checked Google. Obviously nothing had jumped out back then—it would have jumped out at me—and Google’s pivot to antiquity, is a recent phenomenon. Even on January 16, 2023, I didn’t note that anything was strange with a Google search I performed.

No, it was this month when I noticed how old the Google results were for this domain.

Here are the first 50 results visualized.
 

 
It’d be fine if the year was 2013, since most of the pages that Google shows up top are from that year. The rest are older.

I know there’s an argument for removing obsolete pages, but I am of the earlier school of thinking where webmasters were advised:

  • don’t make 404s: if the page still exists then just let it be there, because
  • if it’s not linked from anywhere current, it won’t show up, or show up later in the results;
  • search engines will downrank things that are buried or only linked from pages that are deep within the site, and uprank things that are current and linked from more recently crawled pages.

When it comes to Google, these were truths as well, but it appears after 20-odd years, they are no longer.

Put simply, Google has real trouble indexing dynamic pages and ranking them highly, and the same is found with site:lucire.com. That means this blog’s entries are no longer being found or ranked highly.

What should the behaviour be? Mojeek is instructive, since the spider behaves as a spider should and the results show a more normal mix of static and dynamic. I should note that despite Bing’s obvious limitations (though at the time of writing it claims it has 1,860) it manages to include static and dynamic, too, with two dynamic in the top 10, and eight (one of which is a repeat) in slots 11–20. Overall, it’s closer to what one expects, too.

Below is the same graph for Mojeek.
 

 

The lesson? Got dynamic pages, like a Wordpress blog? New content in that blog? Then don’t expect much from Google as it clearly prefers static HTML. It has followed what Bing was last year, a repository for antiquity. Bing’s index may be shot, but it no longer is about the old stuff. Let’s hope Google, as it copies Bing, gets back into delivering more relevant results as well, and has a spider that functions in the way we all understand. For a market leader, it sure seems pretty clueless.

All the more reason to use Mojeek then.


You may also like

Tags: , , , , , , , , ,
Posted in business, design, internet, media, publishing, technology | No Comments »


Another example of Google’s antiquity when it comes to search results

06.03.2023

Is Google now the Wayback Machine, too?

Since I haven’t used Google regularly since 2010, I can’t do what’s called a longitudinal study, though when I started examining search engine results for Lucire after Bing tanked last year, nothing in my Google searches jumped out at me—till earlier in 2023.

I guess wherever Bing goes, Google follows, since they’re not really innovators—they did well in search, but everything else seems to be about following or acquiring.

With Bing becoming Microsoft’s Wayback Machine, Google followed suit, as revealed when I did site:lucire.com searches. But was it the exception?

Not really: site:jackyan.com still shows my mayoral campaign pages, even though they haven’t been linked since the day before the 2013 mayoral election. And site:jyanet.com, which I tried at the weekend, has some ancient things, there, too.

Like Bing, Google has some trouble crossing into this side of 2010.

Let’s look at the top 10.
 

 

1. Home page. Current, so that’s good. And at least it’s the home page. Bing doesn’t always give you one.

2. CAP Online, last updated 2008, and very sporadically between 2001 and 2008. I don’t think we’ve linked it since then. Maybe, at best, a year later.

3. Lucire’s original home page from 1997. This hasn’t been linked since we got Lucire its own domain in 1998—25 years ago.

4. Our press information pages. Fair enough, and current.

5. JY&A Media. Relevant and currently linked.

6. JY&A Consulting’s old page. Hasn’t been linked by us since 2010. I imagine some might still link to it? But it gets a 404, and has done for a long time. Why rank it so highly?

7. JY&A Fonts. Current and relevant; I would have thought it would rank more highly.

8–10. Press releases from between 2007 and 2009.

I’ve benefited from search engines grandfathering things, but I really couldn’t believe my eyes with pages we haven’t linked to in anywhere between 13 and 25 years. And not that many people would have maintained their links to these pages, either. Certainly the Fonts and Media pages should be up further with links in, and current internal links on our site.

For (6), I don’t have the knowledge to do 301s and a refresh page might penalize jya.co, where the Consulting website is today.

When we took the site to HTTPS last year, both experts and friends told me that it would take a matter of days or weeks for Google to restore its position; that one would not get penalized for going to a secure server. That, I discovered, was not the case. Search engines don’t update, not as regularly as you might think. If what I am seeing is any indication, search engines in 2023 have massive trouble updating, and the top 10 reflect the status of your website as it was a long time ago. For jyanet.com, the top 10 would be perfect if it was 2009; for jackyan.com, it’s how things were in 2013; and for lucire.com, it’s a bit more of a hybrid of what was current in 2005 (framesets! And the old entertainment page) and some pages from after 2011 (including current home and shopping pages).

I don’t care how Google defends itself or blames others for its decreasing ability to find relevant pages; it’s blatantly obvious its search has worsened.


You may also like

Tags: , , , ,
Posted in business, internet, publishing, technology, USA | No Comments »


Cory Doctorow might be predicting the end of the web as we know it

21.02.2023

Two great pieces by Cory Doctorow came my way today on Mastodon.

The first is an incredibly well argued piece about why people leave social networks. Facebook and Twitter won’t be immune, just as MySpace and Bebo weren’t.

One highlight:

As people and businesses started to switch away from the social media giants, inverse network effects set in: the people you stayed on MySpace to hang out with were gone, and without them, all the abuses MySpace was heaping on you were no longer worth it, and you left, too. Once you were gone, that was a reason for someone else to leave. The same forces that drove rapid growth drove rapid collapse.

The second is about all the hype surrounding chatbots, and Google and Bing. Cory begins:

The really remarkable thing isn’t just that Microsoft has decided that the future of search isn’t links to relevant materials, but instead lengthy, florid paragraphs written by a chatbot who happens to be a habitual liar—even more remarkable is that Google agrees.

Microsoft has nothing to lose. It’s spent billions on Bing, a search-engine no one voluntarily uses. Might as well try something so stupid it might just work. But why is Google, a monopolist who has a 90+% share of search worldwide, jumping off the same bridge as Microsoft?

He goes on, analysing how Google is not really an innovator, and most things it has have come to it through acquisition. They wouldn’t know a clever innovation if they saw it.

And:

ChatGPT and its imitators have all the hallmarks of a tech fad, and are truly the successor to last season’s web3 and cryptocurrency pump-and-dumps.

I had better not quote any more as it’s way more important you visit both these pieces and see the entire arguments. Farewell to Big Social then.

Though if Cory is right, and my own thoughts have come close, then is there any point to web searching if these chatbots are going to unleash machine-authored crap, complementing some of the already godawful spun sites out there? Search engines should be finding ways of weeding out spun and AI-authored junk, rather than being in league with them—because that could mean the death of the web.

Or maybe just the death of Google and Bing, because Mojeek might be there to save us all.


You may also like

Tags: , , , , , , , , , , ,
Posted in business, internet, technology, USA | No Comments »


Google’s top 10 continue to bring up old pages; and it looks like Bing’s about to kick us off

07.02.2023

After years of using the web, I think I know a little about how web spidering works. The web spider hits your home page (provided it knows about it), then proceeds to follow the links on it. Precedence is given to the pages within your site that are linked most, or are top-level: in Lucire’s case, that would be the home page, and the HTML pages that come off it (the indices for fashion, beauty, travel, and lifestyle, among others). Weighting would be given to those linked more: with so many fashion stories on the site linking back to the fashion index, the one we’ve used since the mid-2000s, then the fashion index would rank highly. This, I thought, was conventional.

With Bing becoming Microsoft’s Wayback Machine and generally failing to pick up anything after 2009, there must be something else going on in search engine-land. After reporting on Google’s failures in January, I see little has improved. I was even able to do a Google search where 10 per cent of the top 50 results were repeated—which beats Bing’s 40 per cent—though I wasn’t able to replicate that for this post. But the issue is that this shouldn’t be happening at all.
 

 

Here were Google’s top 10 yesterday for site:lucire.com, with my remarks next to the entries. Like Bing, there’s a page on there that’s never been referred to; if it ever were linked, it would have been accidental (it’s the subdirectory for 2002 articles; we used to put an index.html redirect in those directories in case the pages were accidentally hit due to manual coding). The number of times lucire.com/2002 would have been referenced would be fewer than ten, maybe even fewer than five. But there it is, in third.

There are three framesets from 20 years ago that have made it into the top 10. There is Devin Colvin’s entertainment page from 2004–5 that also has not been linked to in 17-plus years—except by Bing and now, Google decides to make it top-10 prominent.

I’ve no feelings either way for a 2011 and a 2022 article to appear in the top 10, though it’s very, very strange that the top-level pages—pages that are linked throughout the site from articles dating from 2005 and later—don’t appear. They used to in Google.

Google cannot hide behind the excuse that its service has worsened because the web’s content has worsened (a phenomenon, I might add, they created). Here is an existing site, one that has always been in their index (since Lucire pre-dates Google) and it’s doing a terrible job of indexing it and ranking the pages.

Brave, with its few pages, gets it right on a search just for Lucire (we can’t do site: there as that’s powered by Bing). It gives us the print ordering page, the beauty index, the news page, and the travel index (‘Volante’). Mojeek requires a search word so obviously that sways things, but even then it manages to come up with the current home page, ‘Volante’ index, and Lucire TV, which are acceptable. At least they’re current, and currently linked. Today, Bing has fallen to five results for site:lucire.com, its lowest ever, and four of those pages are framesets from 2002.
 

 

In fact, it might be time to see how it’s gone for our sample set.
 

 

Not great for us. There are some anomalies there, chiefly Google’s estimates of what it has for The Rake, and it seems Lucire’s on our way out of Bing altogether. Mojeek continues to be the most steady, stable and sensible of these three search engines.

If you’re relying on Google or Bing, you really need to think twice. Something has been wrong with Bing for some time, and it’s catching in Mountain View.


You may also like

Tags: , , , , , , , , ,
Posted in internet, media, publishing, technology, USA | No Comments »


Now Google is worsening on a site: search: framesets from the early 2000s are in the top 10

26.01.2023

This was never supposed to become a search engine blog, but like the Facebook “malware scanner” (or was that scammer?) and Google lying about its Ads Preferences Manager, I was forced to investigate when no one in the media (or, for that matter, the wider internet) did.

And over the years, those posts really helped people and exposed some wrongdoings.

Hence the latest obsession, about Bing, because no one seems to have noticed how Microsoft’s search engine is behaving as though someone at Redmond is unplugging servers left, right and centre.

Someone on Reddit suggested I try Kagi, which is a paid search engine—but from what I can tell, it’s a meta-search (the person who told me about it confirmed this, as did an earlier review).

I’ve seen meta-searches for decades, and admittedly Kagi is the prettiest of them all, but because it’s pulling from Bing and Google, it suffers from the limitations of both, especially the former.

We already have seen how Bing basically favours antiquity over currency, at least where Lucire is concerned, so Kagi’s results contain, in their top 10, pages that have not been updated (or linked) since the mid-2000s. When the Google-sourced results are factored in, it looks a bit better (since there are pages from the 2010s and 2020s), but they still aren’t the most relevant (since it seems Google has been faltering somewhat on site: searches, too).

Here’s a screen shot from Kagi. Results 1, 6 and 7 are current; result 3 is from the early 2010s; results 2, 4, 5 and 8 are framesets from the 2000s; result 9 is from 2014 and hasn’t been linked since then; the remainder are stories which can still be found through spidering but date from between 2011 and 2016.
 

 

Since it’s a meta-search, I decided to peer into Google and its top 10 do not look good, either. As I don’t tend to use Google, and the recent tests were about grabbing the number of search results, or analysing their currency, I hadn’t drilled down on a site:lucire.com search for a while.

Let’s see how they look today.
 


 
Surprisingly bad. Results 1 and 2 are current; results 3, 4 and 5 are framesets from the early 2000s that have not been linked since then; result 6 is from 2005 and has not been linked since then; result 7 is a 2011 story; result 8 is a 2022 story; result 9 is a 2016 story; and result 10 is a 2011 story.

In other words, the Google top 10 has changed probably due to their algorithm, but I wouldn’t call these relevant to what searchers seek. I could understand the old about.shtml staying in the top 10 despite its antiquity, but some of these top-level pages are really old. Framesets? Seriously?

Result 11 is repeated, which is also odd, while results 14 and 15 are tag pages from the Wordpress part of the site. The 15th is for Whangarei, not exactly the fashion centre of the world.

Google’s fall could explain why these blog posts have suffered traffic-wise as its search results are seriously irrelevant; there’s no connection to the pages’ popularity, either. It’s really beginning to feel like the Wayback Machine there, too.

Mojeek still makes more sense, since the search there requires a term, i.e. site:lucire.com lucire, so naturally it gives you pages containing the word Lucire more.
 

 

Result 1 is our home page (makes infinite sense!); result 2 a current top-level contents’ page; result 5 is the main page from Lucire TV; while the rest are stories that have the word Lucire contained in them more than what is typical for our site.

It looks like the US search engines are faltering while Mojeek is getting better. What an interesting development. I didn’t have worsening Google search on my 2023 bingo card.
 
Incidentally, for this website, Google still places my mayoral election pages from 2013 in its top 10; while Mojeek links the home page, the blog, a mixture of posts from 2009, 2020, 2021 and 2022, a transcript of a 2008 speech, and a tag page from 2010. Bing has pages from 2003 and 2012, but also some current top-level pages and, amazingly, three blog posts that are likely to be relevant (two of them critical about Bing from 2022 and 2023, and a 2021 post about Vodafone). In other words, Google has done the worst, in my opinion. Bing only has 10 pages so it has the smallest index but what it showed was surprisingly good! That leaves Mojeek, again, as delivering the best balance of relevance and index size.


You may also like

Tags: , , , , , , , , , ,
Posted in internet, publishing, technology, UK, USA | No Comments »


For most sites, Bing continues to shrink

19.01.2023


The New York Times’ presence on Bing has plunged back to the thousands—it was 2,723 on August 2
 
Back in July, I ran site: searches on a small range of websites to see just how bad things had got with Bing.

In January, I can report some have gone worse. And back in July it was already pathetic.

The first figure below is from today, the parenthesized figure from July.

Remember that Mojeek is the only party that appears to report these figures honestly. Bing repeats results from page to page—around 40 per cent from the searches I’ve done with site:lucire.com. Google will show a few hundred so it’s anyone’s guess. I prefer Mojeek’s 1,000 cap and that works particularly well for the Lucire site.
 
Die Zeit (zeit.de)
Mojeek: 5,279 (4,796)
Google: 2,590,000 (2,600,000)
Bing: 6,010 (3,770)
 
Annabelle (annabelle.ch)
Mojeek: 882 (405)
Google: 14,000 (11,700)
Bing: 25 (105)
 
Holly Jahangiri (jahangiri.us)
Mojeek: 299 (222)
Google: 510 (738)
Bing: 10 (49) but reports 2
 
The Gloss (thegloss.ie)
Mojeek: 2,615 (1,968)
Google: 23,000 (19,200)
Bing: 71 (71)
 
The New York Times (nytimes.com)
Mojeek: 3,547,405 (2,823,329)
Google: 42,800,000 (36,200,000)
Bing: 5,170 (1,190,000)
 
Lucire (lucire.com)
Mojeek: 3,529 (3,572)
Google: 4,940 (6,050)
Bing: 10 (50)
 
The Rake (therake.com)
Mojeek: 1,382 (1,443)
Google: 10,900 (11,500)
Bing: 10 (49)
 
Travel & Leisure (travelandleisure.com)
Mojeek: 11,222 (9,750)
Google: 21,000 (28,100)
Bing: 15,100 (220)
 
Microsoft (microsoft.com)
Mojeek: 1,887,288 (1,748,199)
Google: 120,000,000 (122,000,000)
Bing: 340,000 (14,200,000)
 
Detective Marketing (detectivemarketing.com)
Mojeek: 591 (579)
Google: 835 (998)
Bing: 10 (51)
 

There we have it: some rises at Bing for Die Zeit and Travel & Leisure, steady at The Gloss, but notable falls at The New York Times (back into the thousands, down from millions) and Microsoft’s own website (340,000, down from over 14 million). If you’re an independent publication, your presence on Bing is not rosy, with Annabelle, Lucire and The Rake netting between 10 and 25 despite thousands of pages on each site; while my friend Stefan Engeseth’s Detective Marketing site is also down to 10 from an already low 51.

I know from Mojeek’s blog that they keep plugging hard drives and servers to cope as their index expands. I can only assume from these numbers that Microsoft is unplugging them though they seem to look after you more if you’re an establishment website from a big company.
 
PS.: Here’s another way of looking at the data, factoring in the round of tests I did on August 2.


You may also like

Tags: , , , , , , ,
Posted in internet, media, publishing, technology, USA | No Comments »


How the search engines fare on a site: search here

16.01.2023

Time to do some analysis on the age of the search results for this site through the search engines. I’m curious about the drop in hits. ‘Contents’ pages’ also include static pages and, in Bing’s case, PDFs. (PS.: For clarification, a contents’ page would include a Wordpress tag page, or a page for a set month containing all that month’s posts.)
 
Mojeek
Contents’ pages: ★★★★★★★★★
2002
2003
2004
2005
2006 ★★
2007 ★
2008 ★★
2009 ★★★★★★
2010 ★
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020 ★★
2021 ★★★★★★★★★★★
2022 ★★★★★★★★★★★★★
2023
 
Interesting spread, and no problems indexing PHP pages (after 2010). Some repeat results, with Mojeek having both www.jackyan.com and jackyan.com versions of the same pages. I’m surprised at the gap between 2010 and 2020, though they do appear after the 50 mark.
 
Google
Contents’ pages ★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
 
Now that was a surprise. Only the static, HTML pages, with a lot of ex-Blogger indices (which were also HTML). Talk about being a Wayback Machine. No individual blog posts at all and a lot of really old stuff that isn’t even linked any more. I expected Yandex to do something like this, not Google.
 
Bing
Contents’ pages ★★★★★★★★★
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023 ★
 
Still bizarre. Bing claimed it had six results and delivered 10 on the first page. One blog post from 2023 makes it in here—it’s one attacking Bing and calling it near death. (Of the ones after the 3rd, it’s done marginally better, though it’s still hundreds off the norm.) During the course of the day, the 50-something results Bing had for site:jackyan.com has fallen to 10. Talk about decaying.

Interestingly, Bing gives 50 or so results on mobile—something I discovered this morning after compiling the above and before I pressed ‘Publish’ in Wordpress.
 
Yandex
Contents’ pages ★★★★★★★
2002
2003
2004
2005
2006 ★★★★★★★★★★★★★
2007 ★★★★★★★★★
2008 ★★★
2009 ★★★★★★
2010 ★★★★
2011 ★★
2012
2013
2014
2015
2016
2017
2018
2019 ★★
2020 ★
2021
2022
2023
 
Some repeated results and definitely in favour of static HTML pages (pre-2010) over dynamic ones.
 
Baidu
Contents’ pages ★★★★★★★★
2002
2003
2004
2005
2006
2007
2008
2009
2010 ★
2011 ★
2012
2013
2014 ★
2015
2016
2017 ★★★★
2018 ★★
2019 ★
2020 ★★★★★★★★★
2021 ★★★★★★★★★★★★★★★
2022 ★★★★★★
2023
 
Baidu gives the wrong date for a lot of results, and there was a repeated result, too. But a pretty good site search and far closer to what I expected I would see, since it’s the post-2010 blog posts that I thought were more significant. There were a few in 2006 that got me some international mainstream media coverage and appearances on Aljazeera English’s Listening Post in those early days, but the most read blog entries were from 2016.
 
Yep
Contents’ pages ★★★★
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014 ★★
2015
2016
2017 ★
2018
2019
2020 ★★
2021
2022 ★
2023
 
Not bad for a newbie in beta, spidering both static and dynamic (PHP) pages. Better than Bing’s mix for the 10 each delivers.

Gigablast delivers none.

I can’t say for sure what caused the traffic drop based on the above, since I haven’t documented one of these searches before. So I’ve nothing to compare it to, though my vague memory is that Google would have had some of my actual posts among the top 50. A lot of the pages it does have there aren’t that highly trafficked. Could we blame Google?

Sadly, I don’t have enough data to know for sure, but on the face of it, Google’s top 50 are anomalous, while Bing continues to demonstrate that it’s largely useless.
 
PS.: Just tried site:bing.com. Bing’s results were terrible, including some real estate searches for homes in France, lots of repeated results. Mojeek and Google delivered better results for site:bing.com than Bing did.


You may also like

Tags: , , , , , , , , , ,
Posted in business, China, internet, technology, UK, USA | No Comments »


From January 3, blog traffic went well down

16.01.2023

Around January 3, the regular traffic to each blog post fell off a cliff here. Either my posts have suddenly become a bore and not worth reading, or something else external has happened. Is Feedburner dead? Is it because my Twitter account is locked (by me, a few weeks ago)? Is it the death of Bing? Or were the hundreds of views per post (700 being typical) overinflated all these years? Anyone else observed quite a sudden change? (I did two posts on the 3rd, one is on 309 views, the other on 98 at the time of writing. The rest haven’t picked up much since the second post on the 3rd.)

It’s not a huge deal since I blog as catharsis and when I was on Vox (2006–9), I never looked at any stats anyway. But there was a part of me quite happy that my silly musings were useful or entertaining enough to warrant those visits.

A quick site:jackyan.com search gives us these figures (claimed, followed by actual). Including this post, there are 1,252 posts on Wordpress, and quite a few in the old Blogger archive (still live), so I’d expect over 1,000 results:
 
Mojeek: 456/456
Google: 708/288
Bing: 219/58
Yandex: 2,000/250
Baidu: 2,110/233
Gigablast: —/0
Yep: —/10
 

The western search engines are really low but Mojeek once again leads with pages delivered (and showed exactly the amount of results it said it would). I’m surprised that Baidu does so well here. Yandex has a lot of index pages in their results, so take their figure with a grain of salt; and Bing repeats from page to page—though 58 here (with repeats) is more than 10 for Lucire. Are the search engines the culprits? Or a Wordpress plug-in?


You may also like

Tags: , , , , , , , ,
Posted in culture, internet, technology | No Comments »


Bing increases Techdirt’s results, saving it some embarrassment

13.01.2023

After notifying Mike Masnick, the founder of Techdirt, about my findings about Bing, coincidentally, the search engine began spidering his latest articles. It claimed to have 150 results, and delivered 92, many of which were repeated from page to page as usual. Tonight it’s a claimed 249, delivering 173.

Techdirt is well respected and very popular, and disliked presently only by the Musk bros. What’s the likelihood that Microsoft knew about their shortcomings here and corrected things? I wasn’t exactly quiet, and I told more than Mike and the readers of this blog (I went on Reddit, for example), since it was so ridiculous that Bing could only deliver one result for such a major website. It’s embarrassing for them, so they decided to do the right thing. Like any Big Tech firm: do nothing unless you risk getting bad press. This is right out of the Facebook playbook, for example.

What a pity they could not do the right thing for the rest of us.

Just as a comparison, since I am nothing if not fair. Here are the claimed number of results versus the number delivered for site:techdirt.com:
 
Mojeek: 48,606/1,000
Google: 54,700/394
Bing: 249/173
Yandex: 2,000/250
Baidu: —/1
Gigablast: 0/0
Yep: —/10
 

In that context, it doesn’t look so bad, especially as a lot of Yandex results are of Techdirt’s various directories and largely useless.

It’s not so hot for site:lucire.com over at Bing:
 
Mojeek: 3,481/1,000
Google: 5,970/307
Bing: 2/10
Yandex: 2,000/250
Baidu: 1,480/400
Gigablast: 0/0
Yep: —/10
 

I’m not kidding: Bing claims it had 2 results and delivered 10. Looks like one of those rare times they underestimated. Well off the mark of the 55 they have been doing since mid-2022 and that was pathetic. There is nothing in the results from after 2007. Maybe fixing Techdirt’s results meant that Bing had so little computing power for every other site!

Well, I guess I can no longer claim that for a site:lucire.com search that Bing is repeating results from page to page, since it only has one page.
 


You may also like

Tags: , , , , , , , , , ,
Posted in internet, media, publishing, technology, USA | No Comments »