Testing occidental search engines on site: again: Mojeek, Bing more normal

It was about time I had a look at the occidental search engines again, using a site:jackyan.com search to see how they fared. The previous test for this site was in May 2023.

I knew Google was still terrible, and I knew Bing had improved since the days of having 10 results for the domain. But there are some other observations that might interest followers of this topic.

Let’s begin with Mojeek, which has proved to be the most balanced ever since I began doing these analyses.
 
Mojeek with site:jackyan.com results
 

Mojeek picks up more dynamic pages than static pages on this site, which makes sense, since there are more dynamic pages. No mystery there. The only curiosity is the pink entry in sixth: it’s a gallery page, yet all the gallery pages have been taken offline and are unlinked, as far as I know. It’s content-less.

Largely, Mojeek picks up newer pages, and there are a few old ones (January 2010 was when this blog shifted to PHP, replacing the old Blogger HTML outputs). If you want reasonably current thinking, then Mojeek works well.

Let’s go to Bing, just to get it out of the way.
 
Bing with site:jackyan.com results
 

I suppose I should be grateful Bing has more than 10 pages in its index for this site. In fact, it claims 2,160, and on searches done in 2023, that number is actually correct. The bug where Bing repeated 40 per cent of its results seems to have been remedied.

The curiosities here are the tag index pages appearing in the top 10, shown also in pink. But, overall, not a bad mix of static and dynamic pages, and hugely improved from the dark days of 2022 when it was clear that the search engine had tanked.
 
Google with site:jackyan.com results
 

Google’s next, a.k.a. the Wayback Machine. It’s still mostly incapable of showing PHP pages, preferring the safety of stock HTML and, indeed, preferring the safety of content authored in 2013 and before. This is a search engine that specializes in the old, grandfathered content, and, to be fair, it exhibited grandfathering tendencies even in the good days of the 2000s. Many of the pages it has picked up are static HTML archives created by Blogger when this blog used that service. In 49th is the first PHP page from this blog, other than the blog home page (which is second).

Also interesting—and I probably should have picked up on this last year—is that Google will pick up dynamic pages based on how many words are in the title. If I go from 49th to 100th, skipping index, HTML and PDF entries, here are the positions and the number of words in the title:
 
49 1
50 1
53 2
54 2
55 2
56 2
57 2
58 2
60 3
61 3
62 3
63 3
64 3
66 3
67 3
68 3
69 3
70 3
71 3
72 3
73 3
74 3
75 3
76 3
77 3
78 3
79 3
80 3
81 3
82 4
85 4
86 5
87 4
88 4
89 4
90 4
92 4
93 4
94 4
95 4
97 4
98 4
99 4
 

The length of the words is immaterial.

As Cory Doctorow pointed out earlier this year, Google doesn’t have to be this crap. In fact, GMX, which licenses Google results, has a surprisingly more normal distribution of static and dynamic pages, and no PHP index pages (other than the blog home page). It’s not obsessed with the past. The distribution of PHP pages is fairly spread between old and new.
 
GMX with site:jackyan.com results
 
However, the number of words in the title still come into play:
 
8 1
9 2
10 3
11 3
12 5
13 4
15 4
16 4
17 4
18 4
19 5
20 4
21 5
22 5
23 5
24 5
25 6
26 5
27 6
28 5
29 5
30 6
31 6
32 6
33 7
34 7
35 7
36 6
37 7
38 7
39 7
40 7
41 7
42 7
43 7
44 7
45 7
46 8
47 8
48 8
49 10
50 8
 

So there you have it. Use Mojeek if you want newer content without distinction between static and dynamically created pages.

Use Bing for a fairly similar mix.

Use Google if you want old static content, as it’s afraid of serving up dynamic pages highly.

And use GMX if you want Google results that are more conventionally mixed, with the proviso that it will serve up pages based on how many words are in the title.


You may also like




Leave a Reply

Your email address will not be published. Required fields are marked *