Posts tagged ‘technology’


It’s not just us: Google fares poorly for site: search for Quartz

27.03.2023

Not all of you will have caught the postscript to yesterday’s post. I wanted to see if Google was doing as bad a job with other Wordpress-only websites, and one of the most famous is Quartz.

Sure enough, it was. Of the top 50 for site:qz.com, 33 pages were author, tag or category pages (let’s just say indices for want of a better term). Only 17 were articles.

Quartz is properly famous with a big crew, so the fact Google can’t get a site: search right there, either, shows how bad things must be.

Here’s where the articles appear based on each 10-result page in Google:
 
01–10 ★
11–20 ★★
21–30 ★★★★★★★★
31–40 ★★
41–50 ★★★★
 
In other words, on the first page, there was one article (in fifth). On the second page, two. The third page, happily, there were eight, but the number drops again for the fourth and fifth.

It’s really not the behaviour you expect from a search engine, and as far as I know, till recently, Google operated normally.
 

 

How does Mojeek, whose spider and site operate normally, fare on this test? Better.
 
01–10 ★★★★★★★
11–20 ★★★★★★★★★★
21–30 ★★★★
31–40 ★
41–50
 

I’m not saying the massive number of author pages starting from page 3 on is good, but at least Mojeek is putting them in later, which is what you’d expect. I’d personally prefer they be later still, or not show up at all, for this type of search.

Out of fun, let’s look at Bing:
 
01–06 ★★
07–16 ★★★★★★★★★
17–26 ★★★★★★★★★
27–36 ★★★★★★★★★
37–46 ★★★★★★★★★
47–56 ★★★★★★★★★
 
Yes, there were a lot of repeats (probably around 40 per cent again) and Bing oddly could only deliver six results on the first page, but those results are roughly what you’d expect: a lot of articles and some top-level pages on the first page. It even allowed me to go beyond 56, which is an achievement. Other than the repeated results, Bing delivered results that were closer to what was expected.
 
Earlier today, I discovered one setting in a Wordpress SEO plug-in that allowed tag pages to be indexed on Lucire’s website. That was never a problem till now, but I’ve turned it off. Sorry, Whangarei residents. I’ve asked Google not to make you the fashion capital of the world by having your tag appear first.

On this blog, tag pages were already selected for exclusion, but Google prefers unchanging, static HTML, and that’s another story.


You may also like

Tags: , , , , , , ,
Posted in business, internet, media, New Zealand, publishing, technology, typography, USA | No Comments »


Got dynamic pages or a Wordpress blog? Don’t expect Google to rank your pages highly

25.03.2023

That was short-lived. Bing’s back to offering 55 results for Lucire, and when you go through them, c. 40 per cent are repeated from page to page. However, a lot of the results are from the 2020s now, of both static and dynamic pages, so that’s something. There’s still a handful of truly ancient pages that haven’t been linked for decades, too.

This blog’s views are down dramatically, though as I haven’t fed in site:jackyan.com often into any search engine, it’s hard to say what the cause is. However, it’s more likely than not that Google has caused this, because this very search nets only results for static pages until page 7 except for the home page of this blog.

I note from a search of this blog that the first time site:jackyan.com is recorded was in July 2022, and I had checked Google. Obviously nothing had jumped out back then, so we can safely say this, and Google’s pivot to antiquity, is a recent phenomenon. Even on January 16, 2023, I didn’t note that anything was strange with a Google search I performed.

No, it was this month when I noticed how old the Google results were for this domain.

Here are the first 50 results visualized.
 

 
It’d be fine if the year was 2013.

I know there’s an argument for removing obsolete pages, but I am of the earlier school of thinking where webmasters were advised:

  • don’t make 404s: if the page still exists then just let it be there, because
  • if it’s not linked from anywhere current, it won’t show up, or show up later in the results;
  • search engines will downrank things that are buried or only linked from pages that are deep within the site, and uprank things that are current and linked from more recently crawled pages.

When it comes to Google, these were truths as well, but it appears after 20-odd years, they are no longer.

Put simply, Google has real trouble indexing dynamic pages and ranking them highly, and the same is found with site:lucire.com. That means this blog’s entries are no longer being found or ranked highly.

What should the behaviour be? Mojeek is instructive, since the spider behaves as a spider should and the results show a more normal mix of static and dynamic. I should note that despite Bing’s obvious limitations (though at the time of writing it claims it has 1,860) it manages to include static and dynamic, too, with two dynamic in the top 10, and eight (one of which is a repeat) in slots 11–20. Overall, it’s closer to what one expects, too.

Below is the same graph for Mojeek.
 

 

The lesson? Got dynamic pages, like a Wordpress blog? New content in that blog? Then don’t expect much from Google as it clearly prefers static HTML. It has followed what Bing was last year, a repository for antiquity. Bing’s index may be shot, but it no longer is about the old stuff. Let’s hope Google, as it copies Bing, gets back into delivering more relevant results as well, and has a spider that functions in the way we all understand. For a market leader, it sure seems pretty clueless.

All the more reason to use Mojeek then.


You may also like

Tags: , , , , , , , , ,
Posted in business, design, internet, media, publishing, technology | No Comments »


If you take out Tiktok, then why not Meta, too?

24.03.2023

The Hon Debbie Ngarewa-Packer MP was right when she questioned our government’s decision to ban Tiktok from parliamentary devices.

If it’s about foreigners getting hold of data, then why not ban Facebook and Instagram?

Last I looked, Tiktok had not, unlike Facebook, been party to any genocides.

Parliamentary Services says at least Meta is American and operates in line with our values. So being party to genocide is in line with our values? So information leaking to the likes of Cambridge Analytica—and its effects on democracy—are in line with our values?

It’s all about hopping on an occidental bandwagon over unproven claims that Tiktok hands stuff over to the PRC.

And if it is proven, then let us see the proof.

Let’s say our government doesn’t have the proof but it’s using Edward Snowden’s revelations about the US as a proxy of how data from social media companies wind up with their governments. That’s actually a fair point and we should expect that it’s probably happening. We can make a pretty reasoned guess that it is.

In that case, it’s all the more reason we should consider banning the lot of them, not just Tiktok. Keep our data in our country.

Remember, we’re not banning any of these platforms from private citizens, just what can be used by our Parliament. If it’s about private citizens, I’d be advising that we take out known disinformation ones, which are often funded or manipulated by shady overseas backers or even nation states. They’re literally placing New Zealanders in harm’s way. That would mean a pretty wide net, too, and I imagine no one in power would want to wield that responsibility. Or that the penny will drop, as it usually does, 10 years too late. (Hello, readers of 2033!)
 
Literally as I was completing the title and meta (small m) description fields for this, this Mastodon post from an ethics’ professor appeared.
 

 

In case it ever disappears, she writes:

As your resident TikTok micro-celebrity + tech ethics/policy professor, I have a lot of feelings about the proposed TikTok ban. I think that this statement from Evan Greer of Fight for the Future articulates some points well. If the sole argument is “but China” I would very much like to see something beyond speculation. And if it’s just not that, then go after Meta too. And either way maybe you could pass LITERALLY ANY DATA PRIVACY LAWS.


 

The image is from the Fight for the Future website, and the text reads:

“If it weren’t so alarming, it would be hilarious that US policymakers are trying to ‘be tough on China’ by acting exactly like the Chinese government. Banning an entire app used by millions of people, especially young people, LGBTQ folks, and people of color, is classic state-backed Internet censorship,” said Evan Greer (she/her), director of Fight for the Future. “TikTok uses the exact same surveillance capitalist business model of services like YouTube and Instagram. Yes, it’s concerning that the Chinese government could abuse data that TikTok collects. But even if TikTok were banned, they could access much of the same data simply by purchasing it from data brokers, because there are almost no laws in place to prevent that kind of abuse. If policymakers want to protect Americans from surveillance, they should advocate for strong data privacy laws that prevent all companies (including TikTok!) from collecting so much sensitive data about us in the first place, rather than engaging in what amounts to xenophobic showboating that does exactly nothing to protect anyone.”


You may also like

Tags: , , , , , , , , , , , , ,
Posted in China, globalization, internet, New Zealand, politics, technology, USA | No Comments »


Bing is coming back to life

12.03.2023

In quite an unexpected about-turn, Bing began spidering Lucire’s website again, and not just the old stuff. A site:lucire.com search actually has pages from after 2009 now, and while 42 per cent of results still get repeated from page to page, there are actually pages from the 2010s and the 2020s.

There are still a few ancient pages that have not been linked for a long time. And while Bing claims it has 1,420 results now (considerably more than 10), it won’t show beyond the 56 mark, so some things haven’t changed much.

Still, it’s a positive development worth reporting. The new pages at Autocade also seem to have made it on to Bing, almost instantaneously, or at least within a couple of hours (although Bing claims it only has 22 results for site:autocade.net, a far cry from the 5,000-plus actually on there).

But for the sake of fairness, here’s how Bing’s looking in terms of year breakdowns among the top 50 results (with the repeats taken out). The pattern is beginning to resemble a real search engine’s.
 

 
Contents’ pages ★★★
1997
1998
1999 ★★
2000
2001
2002 ★
2003
2004
2005 ★
2006 ★
2007 ★
2008 ★★
2009
2010
2011
2012
2013
2014
2015 ★
2016
2017
2018
2019
2020
2021 ★★★★
2022 ★★★★★★★★
2023 ★★★★★
 
Static ★★★★★★★★★★★★★★★★★★★
Dynamic ★★★★★★★
 

Maybe that ChatGPT foray gave the search team more money so it can start plugging the servers back in.

Still, I won’t be returning to Duck Duck Go as a default. Bing’s 1,420 is still a fraction of what Mojeek has for Lucire, and who wants to expose their internal-search users to Microsoft?

I’ll see if I can update the spreadsheet soon as I wouldn’t want you to think I only did so when there was bad news.
 
PS.: Here’s the spreadsheet containing Bing’s claimed number of results from a random (randomly among ones I could think of when I first began this analysis) selection of websites. Not universally up at Bing—though Microsoft has more pages on itself than it has done for a while. Cf. the previous one here. Mojeek is the only one consistently adding pages to its record.
 


You may also like

Tags: , , , , , , , ,
Posted in interests, internet, media, New Zealand, technology, USA | No Comments »


The IBM Selectric version of Univers revived

12.03.2023

This is one of the more fascinating type design stories I’ve come across in ages. Jens Kutilek has revived a very unlikely typeface: the IBM Selectric version of Univers in 11 pt.
 

 

A lot of us will have seen things set on a Selectric in the 1970s, especially in New Zealand. I’ve even seen professional advertisements set on a Selectric here. And because of all that exposure, it was pretty obvious to those of us with an interest in type that all the glyphs were designed to set widths regardless of family, and the only one that looked vaguely right was the Selectric version of Times.

Jens goes into a lot more detail but, sure enough, my hunch (from the 1980s and 1990s) was right: Times was indeed the starting-point, and the engineers refused to budge even when Adrian Frutiger worked out average widths and presented them.

It’s why this version of Univers, or Selectric UN, was so compromised.

What I didn’t know was that Frutiger was indeed hired for the gig, to adapt his designs to the machine. I had always believed, because of the compromised design, that IBM did it themselves or contracted it to a specialist, but not the man himself.

There’s plenty of maths involved, but the sort I actually would enjoy (having done one job many years ago to have numerous type families meet the New Zealand Standard for signage, and having to purposefully botch the original, superior kerning pairs in order to achieve it).

I think I kept our IBM golfballs, which carried the type designs on them, and hopefully one day they’ll resurface as they’re a great, nostalgic souvenir of these times.

What is really bizarre reading Jens’s recollection of his digital revival is that it’s set in Selectric UN 11 Medium (an excerpt is shown above). Here is type that was set on to paper, now re-created faithfully, with all of its compromises, for the screen. He’s done an amazing job and it was like reading a schoolbook from the 1970s (but with far more interesting subject-matter). Those Selectric types might not have been the best around, but the typographic world is richer for having them revived.
 
The hits per post here have fallen off a cliff. I imagine we can blame Google. Seven hundred was a typical average, but now I’m looking at dozens. I thought they’d be happy with my obsession over Bing being so crappy during 2022, but then, if they’re following Bing and not innovating, maybe they weren’t. Or that post about their advertising business being a negligence lawsuit waiting to happen (which, incidentally, was one of the most hit pieces over the last few months) might not have gone down well—it was a month after that when the incoming hits to this blog dropped like a stone. Maybe that confirms the veracity of my post.

I’m not terribly surprised. And before you think, ‘Why would Google care?’, ‘Would they bother targeting you?’ or ‘You are so paranoid,’ remember that Google suspended Vivaldi’s advertising account after its CEO criticized them, and in the days of Google Plus, they censored posts that I made that were critical of them. Are they after me? No, but you can bet there are algorithms that work to minimize or censor sites that expose Google’s misbehaviour, regardless of who makes the allegations, just as posts were censored on Google Plus.


You may also like

Tags: , , , , , , , , , , , , ,
Posted in design, interests, internet, New Zealand, technology, typography, UK | No Comments »


Of course Bing AI makes stuff up—Bing itself does

27.02.2023

Of course some of us expected Microsoft Bing and ChatGPT to be rubbish—and we knew ChatGPT would make stuff up. Because Bing makes stuff up.
 

 

If you have a normal, functioning web crawler (or spider), there’s no way you would ever wind up with pages that have never existed. Nothing about this is normal.

The latest contributions from Microsoft’s Wayback Machine for site:lucire.com are these. On my phone, I noticed it had ranked in third place, after two framesets from the early 2000s, a page we had for Plucker for the Palm Pilot! That gives you an idea of how old Bing’s index must be.
 

 

On the desktop, meanwhile, a site:lucire.com search now includes sites that aren’t lucire.com. I guess if your index is that small now, you need to pad it out not just with repetition, but other domains. One is related to us—it’s our Dailymotion channel—but the other is totally random with no connection whatsoever. Bit like ChatGPT.
 

 

My friend Robin Capper has discovered the same, when enquiring with the new Bing about himself. It claimed to have sourced from his Linkedin—but fed him back facts that are nowhere to be found. Here’s his blog post. I like how he put artificial intelligence in quotes, since there’s nothing intelligent about this. It’s a simple text processor, but it sure gets a lot of things wrong.


You may also like

Tags: , , , , , , , , ,
Posted in business, internet, New Zealand, technology, typography, USA | No Comments »


Being transparent: you might as well know what you’re going to click on

11.02.2023

This may be wise, or it may not be. I thought netizens would want to know if a page was ancient, and since Google keeps putting really old stuff that hasn’t been linked for nearly 20 years into the top 10 for a site:lucire.com search, I may as well let readers know what they’re going to click on by telling them in the meta descriptions of some of our 2000–5 pages.

I guess Bing led the way, showing that you should turn search engines into Wayback Machines, and Google has recently followed suit, putting these old framesets and pages up instead of the current indices that it used to show. Who knows what’s going on with these folks? Maybe they love nostalgia?
 

 

I know I’ve complained, too, when search engines offer up novelty over relevance, but here these pages aren’t even that relevant. Certainly not the ones you’d expect to see in the top 10 if search engines spidered like they used to (and Mojeek and Brave clearly still do).


You may also like

Tags: , , , , ,
Posted in internet, technology, USA | No Comments »


Google’s top 10 continue to bring up old pages; and it looks like Bing’s about to kick us off

07.02.2023

After years of using the web, I think I know a little about how web spidering works. The web spider hits your home page (provided it knows about it), then proceeds to follow the links on it. Precedence is given to the pages within your site that are linked most, or are top-level: in Lucire’s case, that would be the home page, and the HTML pages that come off it (the indices for fashion, beauty, travel, and lifestyle, among others). Weighting would be given to those linked more: with so many fashion stories on the site linking back to the fashion index, the one we’ve used since the mid-2000s, then the fashion index would rank highly. This, I thought, was conventional.

With Bing becoming Microsoft’s Wayback Machine and generally failing to pick up anything after 2009, there must be something else going on in search engine-land. After reporting on Google’s failures in January, I see little has improved. I was even able to do a Google search where 10 per cent of the top 50 results were repeated—which beats Bing’s 40 per cent—though I wasn’t able to replicate that for this post. But the issue is that this shouldn’t be happening at all.
 

 

Here were Google’s top 10 yesterday for site:lucire.com, with my remarks next to the entries. Like Bing, there’s a page on there that’s never been referred to; if it ever were linked, it would have been accidental (it’s the subdirectory for 2002 articles; we used to put an index.html redirect in those directories in case the pages were accidentally hit due to manual coding). The number of times lucire.com/2002 would have been referenced would be fewer than ten, maybe even fewer than five. But there it is, in third.

There are three framesets from 20 years ago that have made it into the top 10. There is Devin Colvin’s entertainment page from 2004–5 that also has not been linked to in 17-plus years—except by Bing and now, Google decides to make it top-10 prominent.

I’ve no feelings either way for a 2011 and a 2022 article to appear in the top 10, though it’s very, very strange that the top-level pages—pages that are linked throughout the site from articles dating from 2005 and later—don’t appear. They used to in Google.

Google cannot hide behind the excuse that its service has worsened because the web’s content has worsened (a phenomenon, I might add, they created). Here is an existing site, one that has always been in their index (since Lucire pre-dates Google) and it’s doing a terrible job of indexing it and ranking the pages.

Brave, with its few pages, gets it right on a search just for Lucire (we can’t do site: there as that’s powered by Bing). It gives us the print ordering page, the beauty index, the news page, and the travel index (‘Volante’). Mojeek requires a search word so obviously that sways things, but even then it manages to come up with the current home page, ‘Volante’ index, and Lucire TV, which are acceptable. At least they’re current, and currently linked. Today, Bing has fallen to five results for site:lucire.com, its lowest ever, and four of those pages are framesets from 2002.
 

 

In fact, it might be time to see how it’s gone for our sample set.
 

 

Not great for us. There are some anomalies there, chiefly Google’s estimates of what it has for The Rake, and it seems Lucire’s on our way out of Bing altogether. Mojeek continues to be the most steady, stable and sensible of these three search engines.

If you’re relying on Google or Bing, you really need to think twice. Something has been wrong with Bing for some time, and it’s catching in Mountain View.


You may also like

Tags: , , , , , , , , ,
Posted in internet, media, publishing, technology, USA | No Comments »


Nice try, Marissa Mayer, but no conversion

30.01.2023

I had a chuckle at Marissa Mayer saying that Google results are worse because the web is worse.

As I’ve shown with a site:lucire.com search, which is a good one since our site pre-dates Google (just), Google is less capable of providing the relevant pages for a typical search.

I know how web spiders work in theory, and there’s no way that 2002 framesets are coming up in a 2023 crawl. We haven’t linked to those pages for a long, long time. But Google is throwing those into the top 10.

And we can extend this argument: Google, through its advertising, incentivized the creation of the very crap polluting the web.

Mayer said, ‘I think because there’s a lot of economic incentive for misinformation, for clicks, for purchases.

‘There’s a lot more fraud on the web today than there was 20 years ago.’

What’s the bet that these fraudulent pages are carrying Google ads?

As Don Marti, who knows a lot more about this than I do, said to me: ‘It’s all about moving traffic and ads away from sites that people want, and that advertisers want to sponsor, to places where Google gets a bigger % of the ad money (even if they’re on the sketchy side)’.

I think all this was foreseeable, and one could prove negligence on Google’s part. I still remember a time when established publishers like me wouldn’t join Google’s ad programmes because they were seen as an advertising service for second-rate (or worse) sites. They would appear on places like Blogger, which Google wound up buying.

Then the buggers wound up monopolizing the area, and things got worse for digital publishers as the ad rates got lower and lower—and, as Don notes, the money can find its way to the bottom feeders.

So Google does have a problem, and it is also the cause of a problem. Maybe breaking it up will solve some of them, and I’m glad the US Department of Justice is finally courageous enough to do something about it.
 
A spot-on insight from Brenda Wallace earlier today on Mastodon.
 

 

 
An irrelevant side note: it turns out the previous post was the 1,234th on this blog.


You may also like

Tags: , , , , , , , , , , , ,
Posted in culture, internet, media, publishing, technology, USA | No Comments »


The emperor has no clothes, so Microsoft does what little it can do

19.01.2023

When you’ve been saying the emperor has no clothes for the last few months—and on the emperor’s forums—I shouldn’t be surprised we are at this point now.

Bing is virtually dead, and they don’t want me probing Bing Webmaster Tools (which are largely useless) about my own sites to show up even more of their BS.
 

 

As moves go, this is pretty daft. I mean, it was pretty useless before, so now I wonder how much more BS there is. I guess whomever is running Bing wants to confirm to me that Bing is dead along with the rest of it.


You may also like

Tags: , , , , , ,
Posted in internet, technology, USA | No Comments »