Posts tagged ‘Google’


It’s not just us: Google fares poorly for site: search for Quartz

27.03.2023

Not all of you will have caught the postscript to yesterday’s post. I wanted to see if Google was doing as bad a job with other Wordpress-only websites, and one of the most famous is Quartz.

Sure enough, it was. Of the top 50 for site:qz.com, 33 pages were author, tag or category pages (let’s just say indices for want of a better term). Only 17 were articles.

Quartz is properly famous with a big crew, so the fact Google can’t get a site: search right there, either, shows how bad things must be.

Here’s where the articles appear based on each 10-result page in Google:
 
01–10 ★
11–20 ★★
21–30 ★★★★★★★★
31–40 ★★
41–50 ★★★★
 
In other words, on the first page, there was one article (in fifth). On the second page, two. The third page, happily, there were eight, but the number drops again for the fourth and fifth.

It’s really not the behaviour you expect from a search engine, and as far as I know, till recently, Google operated normally.
 

 

How does Mojeek, whose spider and site operate normally, fare on this test? Better.
 
01–10 ★★★★★★★
11–20 ★★★★★★★★★★
21–30 ★★★★
31–40 ★
41–50
 

I’m not saying the massive number of author pages starting from page 3 on is good, but at least Mojeek is putting them in later, which is what you’d expect. I’d personally prefer they be later still, or not show up at all, for this type of search.

Out of fun, let’s look at Bing:
 
01–06 ★★
07–16 ★★★★★★★★★
17–26 ★★★★★★★★★
27–36 ★★★★★★★★★
37–46 ★★★★★★★★★
47–56 ★★★★★★★★★
 
Yes, there were a lot of repeats (probably around 40 per cent again) and Bing oddly could only deliver six results on the first page, but those results are roughly what you’d expect: a lot of articles and some top-level pages on the first page. It even allowed me to go beyond 56, which is an achievement. Other than the repeated results, Bing delivered results that were closer to what was expected.
 
Earlier today, I discovered one setting in a Wordpress SEO plug-in that allowed tag pages to be indexed on Lucire’s website. That was never a problem till now, but I’ve turned it off. Sorry, Whangarei residents. I’ve asked Google not to make you the fashion capital of the world by having your tag appear first.

On this blog, tag pages were already selected for exclusion, but Google prefers unchanging, static HTML, and that’s another story.


You may also like

Tags: , , , , , , ,
Posted in business, internet, media, New Zealand, publishing, technology, typography, USA | No Comments »


How Google fares in a site: search if your site is all PHP—Wordpress users beware

26.03.2023

After the last post, you may be thinking: surely if my site was entirely PHP, Google wouldn’t have a problem identifying which were the most important articles? They are the biggest search engine in the world and all that data would ensure that they knew how to rank the dynamic pages properly. They would have some idea, based on who is clicking on what, which pages should be placed up high in a site: search.

You might have been right in the past, but in 2023, you’d be wrong.

Just as with Lucire and my personal site (both of which are sites with a mixture of static HTML and dynamic PHP pages), Google does a terrible job. With Lucire, PHP drives the news section. The top PHP page Google shows with site:lucire.com comes in second, which is great, and it’s the main news page on the site. But beyond that, not a single PHP-driven news article appears in the top 100 results on Google. There are repeat pages, a bug that Google has introduced as it follows Bing in padding out the results.

There are some PHP pages though, namely pages containing tags. But even those make no sense in terms of ordering. The first is in no. 11, a PHP-generated index of pages tagged with Whangarei. I repeated the search and this result fell to no. 16. Other tags on the second page of results were for 2020 and Bulgari. There is no public tag cloud any more, but I seem to recall the actual top one is Lucire, followed by fashion.

So even on tags, Google gets the ranking very wrong in a site: search, and there is nothing on the site that would lead its spider to think Whangarei needed to be so high. The visits to the site do not bear this out as well.

When you look at a site like Lucire Rouge, which is entirely PHP, it’s an incredible surprise to find that its top pages are dominated by tags, categories and authors’ pages, plus many contents’ pages. These follow the home and contact pages. Again, there are no article pages in the top 100. You can find out for yourself using site:lucirerouge.com. Or better yet, try out a site you know and see if it follows the same pattern.
 

 

I recall there was a Wordpress SEO plug-in that helped you manage these, ridding you of the tag and other contents’ pages, but Google never needed its hand held so badly before. And if I can’t remember what that plug-in was or how it worked, how does the regular punter? (I have a feeling the plug-in we used became obsolete, or was it inside Wordpress as standard? Your guess is as good as mine. I couldn’t find it when I wrote this blog post, but then a lot of websites are no longer intuitive to use.)

With Lucire Men, also entirely PHP, you have to get to page 5 and result no. 43 before you encounter the first article; the first 42 are tag pages or other forms of contents’ pages. Nos. 44, 45 and 46 are also articles. Then it’s back to the indices before page 6 shows all articles.

Individual dynamic pages or posts—those generated by Wordpress, for instance—now appear to be far too difficult for Google to handle. If your site uses Wordpress, expect Google to have difficulty with it now; it certainly answers why this blog’s visits have fallen so badly if Google no longer shows posts up top in a site: search. It means it doesn’t really rate them, so how on earth would I expect them to show up in a search for the topics being covered?

It’s pretty disappointing to see Google search fall so quickly in such a short space of time.
 
Of course there are exceptions, and Google seems to do reasonably well on a site:autocade.net search. That site, run on Mediawiki, is all PHP as well, and the results are today as I recall them ages ago. The ones up top have been pretty popular—certainly they are for models that are among the top pages on the site. And that’s how Google should behave. Goodness knows why it can’t handle Wordpress properly any more.

The only oddity here is that Google estimates it has 2,960 results. Mojeek, the search engine whose spider works properly and delivers results as expected, has 3,277. It’s probably the first site I’ve observed where Mojeek has more pages indexed than Google. Is it the beginning of the shift where Mojeek has a larger and more relevant index than Google?
 

 
 
PS., 9.42 p.m. UTC: Sure enough, it’s not just us. Quartz is pretty famous, and they’re run off Wordpress exclusively. Most of their top pages for site:qz.com are tag and author ones, too, though their first article, one which I couldn’t imagine would be their top story, appears as result no. 5. Quartz gets a ton of traffic, but Google can’t do right by them, either.


You may also like

Tags: , , , , , , , , , ,
Posted in internet, media, publishing, technology, USA | No Comments »


Got dynamic pages or a Wordpress blog? Don’t expect Google to rank your pages highly

25.03.2023

That was short-lived. Bing’s back to offering 55 results for Lucire, and when you go through them, c. 40 per cent are repeated from page to page. However, a lot of the results are from the 2020s now, of both static and dynamic pages, so that’s something. There’s still a handful of truly ancient pages that haven’t been linked for decades, too.

This blog’s views are down dramatically, though as I haven’t fed in site:jackyan.com often into any search engine, it’s hard to say what the cause is. However, it’s more likely than not that Google has caused this, because this very search nets only results for static pages until page 7 except for the home page of this blog.

I note from a search of this blog that the first time site:jackyan.com is recorded was in July 2022, and I had checked Google. Obviously nothing had jumped out back then, so we can safely say this, and Google’s pivot to antiquity, is a recent phenomenon. Even on January 16, 2023, I didn’t note that anything was strange with a Google search I performed.

No, it was this month when I noticed how old the Google results were for this domain.

Here are the first 50 results visualized.
 

 
It’d be fine if the year was 2013.

I know there’s an argument for removing obsolete pages, but I am of the earlier school of thinking where webmasters were advised:

  • don’t make 404s: if the page still exists then just let it be there, because
  • if it’s not linked from anywhere current, it won’t show up, or show up later in the results;
  • search engines will downrank things that are buried or only linked from pages that are deep within the site, and uprank things that are current and linked from more recently crawled pages.

When it comes to Google, these were truths as well, but it appears after 20-odd years, they are no longer.

Put simply, Google has real trouble indexing dynamic pages and ranking them highly, and the same is found with site:lucire.com. That means this blog’s entries are no longer being found or ranked highly.

What should the behaviour be? Mojeek is instructive, since the spider behaves as a spider should and the results show a more normal mix of static and dynamic. I should note that despite Bing’s obvious limitations (though at the time of writing it claims it has 1,860) it manages to include static and dynamic, too, with two dynamic in the top 10, and eight (one of which is a repeat) in slots 11–20. Overall, it’s closer to what one expects, too.

Below is the same graph for Mojeek.
 

 

The lesson? Got dynamic pages, like a Wordpress blog? New content in that blog? Then don’t expect much from Google as it clearly prefers static HTML. It has followed what Bing was last year, a repository for antiquity. Bing’s index may be shot, but it no longer is about the old stuff. Let’s hope Google, as it copies Bing, gets back into delivering more relevant results as well, and has a spider that functions in the way we all understand. For a market leader, it sure seems pretty clueless.

All the more reason to use Mojeek then.


You may also like

Tags: , , , , , , , , ,
Posted in business, design, internet, media, publishing, technology | No Comments »


Bing is coming back to life

12.03.2023

In quite an unexpected about-turn, Bing began spidering Lucire’s website again, and not just the old stuff. A site:lucire.com search actually has pages from after 2009 now, and while 42 per cent of results still get repeated from page to page, there are actually pages from the 2010s and the 2020s.

There are still a few ancient pages that have not been linked for a long time. And while Bing claims it has 1,420 results now (considerably more than 10), it won’t show beyond the 56 mark, so some things haven’t changed much.

Still, it’s a positive development worth reporting. The new pages at Autocade also seem to have made it on to Bing, almost instantaneously, or at least within a couple of hours (although Bing claims it only has 22 results for site:autocade.net, a far cry from the 5,000-plus actually on there).

But for the sake of fairness, here’s how Bing’s looking in terms of year breakdowns among the top 50 results (with the repeats taken out). The pattern is beginning to resemble a real search engine’s.
 

 
Contents’ pages ★★★
1997
1998
1999 ★★
2000
2001
2002 ★
2003
2004
2005 ★
2006 ★
2007 ★
2008 ★★
2009
2010
2011
2012
2013
2014
2015 ★
2016
2017
2018
2019
2020
2021 ★★★★
2022 ★★★★★★★★
2023 ★★★★★
 
Static ★★★★★★★★★★★★★★★★★★★
Dynamic ★★★★★★★
 

Maybe that ChatGPT foray gave the search team more money so it can start plugging the servers back in.

Still, I won’t be returning to Duck Duck Go as a default. Bing’s 1,420 is still a fraction of what Mojeek has for Lucire, and who wants to expose their internal-search users to Microsoft?

I’ll see if I can update the spreadsheet soon as I wouldn’t want you to think I only did so when there was bad news.
 
PS.: Here’s the spreadsheet containing Bing’s claimed number of results from a random (randomly among ones I could think of when I first began this analysis) selection of websites. Not universally up at Bing—though Microsoft has more pages on itself than it has done for a while. Cf. the previous one here. Mojeek is the only one consistently adding pages to its record.
 


You may also like

Tags: , , , , , , , ,
Posted in interests, internet, media, New Zealand, technology, USA | No Comments »


The IBM Selectric version of Univers revived

12.03.2023

This is one of the more fascinating type design stories I’ve come across in ages. Jens Kutilek has revived a very unlikely typeface: the IBM Selectric version of Univers in 11 pt.
 

 

A lot of us will have seen things set on a Selectric in the 1970s, especially in New Zealand. I’ve even seen professional advertisements set on a Selectric here. And because of all that exposure, it was pretty obvious to those of us with an interest in type that all the glyphs were designed to set widths regardless of family, and the only one that looked vaguely right was the Selectric version of Times.

Jens goes into a lot more detail but, sure enough, my hunch (from the 1980s and 1990s) was right: Times was indeed the starting-point, and the engineers refused to budge even when Adrian Frutiger worked out average widths and presented them.

It’s why this version of Univers, or Selectric UN, was so compromised.

What I didn’t know was that Frutiger was indeed hired for the gig, to adapt his designs to the machine. I had always believed, because of the compromised design, that IBM did it themselves or contracted it to a specialist, but not the man himself.

There’s plenty of maths involved, but the sort I actually would enjoy (having done one job many years ago to have numerous type families meet the New Zealand Standard for signage, and having to purposefully botch the original, superior kerning pairs in order to achieve it).

I think I kept our IBM golfballs, which carried the type designs on them, and hopefully one day they’ll resurface as they’re a great, nostalgic souvenir of these times.

What is really bizarre reading Jens’s recollection of his digital revival is that it’s set in Selectric UN 11 Medium (an excerpt is shown above). Here is type that was set on to paper, now re-created faithfully, with all of its compromises, for the screen. He’s done an amazing job and it was like reading a schoolbook from the 1970s (but with far more interesting subject-matter). Those Selectric types might not have been the best around, but the typographic world is richer for having them revived.
 
The hits per post here have fallen off a cliff. I imagine we can blame Google. Seven hundred was a typical average, but now I’m looking at dozens. I thought they’d be happy with my obsession over Bing being so crappy during 2022, but then, if they’re following Bing and not innovating, maybe they weren’t. Or that post about their advertising business being a negligence lawsuit waiting to happen (which, incidentally, was one of the most hit pieces over the last few months) might not have gone down well—it was a month after that when the incoming hits to this blog dropped like a stone. Maybe that confirms the veracity of my post.

I’m not terribly surprised. And before you think, ‘Why would Google care?’, ‘Would they bother targeting you?’ or ‘You are so paranoid,’ remember that Google suspended Vivaldi’s advertising account after its CEO criticized them, and in the days of Google Plus, they censored posts that I made that were critical of them. Are they after me? No, but you can bet there are algorithms that work to minimize or censor sites that expose Google’s misbehaviour, regardless of who makes the allegations, just as posts were censored on Google Plus.


You may also like

Tags: , , , , , , , , , , , , ,
Posted in design, interests, internet, New Zealand, technology, typography, UK | No Comments »


Another example of Google’s antiquity when it comes to search results

06.03.2023

Is Google now the Wayback Machine, too?

Since I haven’t used Google regularly since 2010, I can’t do what’s called a longitudinal study, though when I started examining search engine results for Lucire after Bing tanked last year, nothing in my Google searches jumped out at me—till earlier in 2023.

I guess wherever Bing goes, Google follows, since they’re not really innovators—they did well in search, but everything else seems to be about following or acquiring.

With Bing becoming Microsoft’s Wayback Machine, Google followed suit, as revealed when I did site:lucire.com searches. But was it the exception?

Not really: site:jackyan.com still shows my mayoral campaign pages, even though they haven’t been linked since the day before the 2013 mayoral election. And site:jyanet.com, which I tried at the weekend, has some ancient things, there, too.

Like Bing, Google has some trouble crossing into this side of 2010.

Let’s look at the top 10.
 

 

1. Home page. Current, so that’s good. And at least it’s the home page. Bing doesn’t always give you one.

2. CAP Online, last updated 2008, and very sporadically between 2001 and 2008. I don’t think we’ve linked it since then. Maybe, at best, a year later.

3. Lucire’s original home page from 1997. This hasn’t been linked since we got Lucire its own domain in 1998—25 years ago.

4. Our press information pages. Fair enough, and current.

5. JY&A Media. Relevant and currently linked.

6. JY&A Consulting’s old page. Hasn’t been linked by us since 2010. I imagine some might still link to it? But it gets a 404, and has done for a long time. Why rank it so highly?

7. JY&A Fonts. Current and relevant; I would have thought it would rank more highly.

8–10. Press releases from between 2007 and 2009.

I’ve benefited from search engines grandfathering things, but I really couldn’t believe my eyes with pages we haven’t linked to in anywhere between 13 and 25 years. And not that many people would have maintained their links to these pages, either. Certainly the Fonts and Media pages should be up further with links in, and current internal links on our site.

For (6), I don’t have the knowledge to do 301s and a refresh page might penalize jya.co, where the Consulting website is today.

When we took the site to HTTPS last year, both experts and friends told me that it would take a matter of days or weeks for Google to restore its position; that one would not get penalized for going to a secure server. That, I discovered, was not the case. Search engines don’t update, not as regularly as you might think. If what I am seeing is any indication, search engines in 2023 have massive trouble updating, and the top 10 reflect the status of your website as it was a long time ago. For jyanet.com, the top 10 would be perfect if it was 2009; for jackyan.com, it’s how things were in 2013; and for lucire.com, it’s a bit more of a hybrid of what was current in 2005 (framesets! And the old entertainment page) and some pages from after 2011 (including current home and shopping pages).

I don’t care how Google defends itself or blames others for its decreasing ability to find relevant pages; it’s blatantly obvious its search has worsened.


You may also like

Tags: , , , ,
Posted in business, internet, publishing, technology, USA | No Comments »


Cory Doctorow might be predicting the end of the web as we know it

21.02.2023

Two great pieces by Cory Doctorow came my way today on Mastodon.

The first is an incredibly well argued piece about why people leave social networks. Facebook and Twitter won’t be immune, just as MySpace and Bebo weren’t.

One highlight:

As people and businesses started to switch away from the social media giants, inverse network effects set in: the people you stayed on MySpace to hang out with were gone, and without them, all the abuses MySpace was heaping on you were no longer worth it, and you left, too. Once you were gone, that was a reason for someone else to leave. The same forces that drove rapid growth drove rapid collapse.

The second is about all the hype surrounding chatbots, and Google and Bing. Cory begins:

The really remarkable thing isn’t just that Microsoft has decided that the future of search isn’t links to relevant materials, but instead lengthy, florid paragraphs written by a chatbot who happens to be a habitual liar—even more remarkable is that Google agrees.

Microsoft has nothing to lose. It’s spent billions on Bing, a search-engine no one voluntarily uses. Might as well try something so stupid it might just work. But why is Google, a monopolist who has a 90+% share of search worldwide, jumping off the same bridge as Microsoft?

He goes on, analysing how Google is not really an innovator, and most things it has have come to it through acquisition. They wouldn’t know a clever innovation if they saw it.

And:

ChatGPT and its imitators have all the hallmarks of a tech fad, and are truly the successor to last season’s web3 and cryptocurrency pump-and-dumps.

I had better not quote any more as it’s way more important you visit both these pieces and see the entire arguments. Farewell to Big Social then.

Though if Cory is right, and my own thoughts have come close, then is there any point to web searching if these chatbots are going to unleash machine-authored crap, complementing some of the already godawful spun sites out there? Search engines should be finding ways of weeding out spun and AI-authored junk, rather than being in league with them—because that could mean the death of the web.

Or maybe just the death of Google and Bing, because Mojeek might be there to save us all.


You may also like

Tags: , , , , , , , , , , ,
Posted in business, internet, technology, USA | No Comments »


Being transparent: you might as well know what you’re going to click on

11.02.2023

This may be wise, or it may not be. I thought netizens would want to know if a page was ancient, and since Google keeps putting really old stuff that hasn’t been linked for nearly 20 years into the top 10 for a site:lucire.com search, I may as well let readers know what they’re going to click on by telling them in the meta descriptions of some of our 2000–5 pages.

I guess Bing led the way, showing that you should turn search engines into Wayback Machines, and Google has recently followed suit, putting these old framesets and pages up instead of the current indices that it used to show. Who knows what’s going on with these folks? Maybe they love nostalgia?
 

 

I know I’ve complained, too, when search engines offer up novelty over relevance, but here these pages aren’t even that relevant. Certainly not the ones you’d expect to see in the top 10 if search engines spidered like they used to (and Mojeek and Brave clearly still do).


You may also like

Tags: , , , , ,
Posted in internet, technology, USA | No Comments »


Your preferences mean nothing to Google or Microsoft

08.02.2023

I could just repeat my post from January 26. Let’s face it, Google is a notorious spammer, with a failing search engine, and an advertising business that’s a decent negligence lawsuit away from collapsing.

It was 2011 when I showed everyone that your opt-out settings in Google Ads Preferences Manager were meaningless. Today, your email preferences are meaningless, since of course this incompetent Big Tech firm spammed me again. There must be some pretty hopeless technology behind all of this. You have to wonder when a company can’t get the fundamentals right.

No wonder so many spammers choose Gmail.
 



 
Of course, Microsoft is pretty hopeless at the best of times. Today I tried to access my desktop PC’s hard drive across the network from my laptop, only to be asked again that I feed in a username and password. Except neither exists when I’m doing local stuff. I couldn’t remember how I got around it last time, but today it was buried here:
 

 

Of course this stuff changes every time, and Windows seems to change your settings without you knowing.

I wouldn’t have needed to do this file transfer if Apples worked. Last time I tried to get cellphone images on to an Imac, it was a pretty simple procedure (make sure they are both on Bluetooth and let them chat). These days it insists you upload everything on to Icloud and get it down off Icloud. Even when the Iphone and Imac are connected via USB.

I’m sure there’s a way around this, but I really couldn’t be bothered finding it. It proved quicker to plug in the Iphone to a Windows computer. Shame that my problems weren’t over after I did this, since all computer companies like to make things difficult for users.


You may also like

Tags: , , , , , , , ,
Posted in internet, technology | No Comments »


Google’s top 10 continue to bring up old pages; and it looks like Bing’s about to kick us off

07.02.2023

After years of using the web, I think I know a little about how web spidering works. The web spider hits your home page (provided it knows about it), then proceeds to follow the links on it. Precedence is given to the pages within your site that are linked most, or are top-level: in Lucire’s case, that would be the home page, and the HTML pages that come off it (the indices for fashion, beauty, travel, and lifestyle, among others). Weighting would be given to those linked more: with so many fashion stories on the site linking back to the fashion index, the one we’ve used since the mid-2000s, then the fashion index would rank highly. This, I thought, was conventional.

With Bing becoming Microsoft’s Wayback Machine and generally failing to pick up anything after 2009, there must be something else going on in search engine-land. After reporting on Google’s failures in January, I see little has improved. I was even able to do a Google search where 10 per cent of the top 50 results were repeated—which beats Bing’s 40 per cent—though I wasn’t able to replicate that for this post. But the issue is that this shouldn’t be happening at all.
 

 

Here were Google’s top 10 yesterday for site:lucire.com, with my remarks next to the entries. Like Bing, there’s a page on there that’s never been referred to; if it ever were linked, it would have been accidental (it’s the subdirectory for 2002 articles; we used to put an index.html redirect in those directories in case the pages were accidentally hit due to manual coding). The number of times lucire.com/2002 would have been referenced would be fewer than ten, maybe even fewer than five. But there it is, in third.

There are three framesets from 20 years ago that have made it into the top 10. There is Devin Colvin’s entertainment page from 2004–5 that also has not been linked to in 17-plus years—except by Bing and now, Google decides to make it top-10 prominent.

I’ve no feelings either way for a 2011 and a 2022 article to appear in the top 10, though it’s very, very strange that the top-level pages—pages that are linked throughout the site from articles dating from 2005 and later—don’t appear. They used to in Google.

Google cannot hide behind the excuse that its service has worsened because the web’s content has worsened (a phenomenon, I might add, they created). Here is an existing site, one that has always been in their index (since Lucire pre-dates Google) and it’s doing a terrible job of indexing it and ranking the pages.

Brave, with its few pages, gets it right on a search just for Lucire (we can’t do site: there as that’s powered by Bing). It gives us the print ordering page, the beauty index, the news page, and the travel index (‘Volante’). Mojeek requires a search word so obviously that sways things, but even then it manages to come up with the current home page, ‘Volante’ index, and Lucire TV, which are acceptable. At least they’re current, and currently linked. Today, Bing has fallen to five results for site:lucire.com, its lowest ever, and four of those pages are framesets from 2002.
 

 

In fact, it might be time to see how it’s gone for our sample set.
 

 

Not great for us. There are some anomalies there, chiefly Google’s estimates of what it has for The Rake, and it seems Lucire’s on our way out of Bing altogether. Mojeek continues to be the most steady, stable and sensible of these three search engines.

If you’re relying on Google or Bing, you really need to think twice. Something has been wrong with Bing for some time, and it’s catching in Mountain View.


You may also like

Tags: , , , , , , , , ,
Posted in internet, media, publishing, technology, USA | No Comments »