Posts tagged ‘search engines’


We should challenge monopolists, not do business with them at the exclusion of ethical parties

17.10.2022

Search engine Mojeek is doing no wrong in my book. Here’s its CEO Colin Hayhurst being interviewed by The New Era’s Jeffrey Peel, making complete sense, which is not something I can say about anyone speaking for Big Tech. We should be shunning monopolists if we truly value progress and innovation, or even a proper, factual debate. We even have laws about it that few seem to wish to enforce when it comes to Big Tech players. It’s well worth a watch.
 
I was disappointed to see that the Warehouse, our big retailer, specifically blocks Mojeek from searching its site. Google is fine. Explanations vary—but they include the theory that the Warehouse wants to get data from its users and Google can provide them.

I’ve written to the Warehouse as an account holder and received no reply. I decided to take it higher, to its chief digital officer, on October 3. As far as I know this email has been delivered, but there’s always a possibility I have her address wrong. Regardless, I am yet to hear back on any front, including social media where I had asked the Warehouse why they would wish to block a legitimate and far more ethical search engine. What does it say about your company when you choose to do business with someone as questionable as Google, yet you go out of your way to block a fully ethical and privacy-respecting business?

Dear Sarah:
 
I contacted the Warehouse through the customer service channels at the beginning of September and have yet to hear back.

As CDO I think you’re the right person to raise this with, though please refer it to a colleague if you aren’t.

I run Lucire Ltd. and have been a Warehouse account holder for some time. Our own foundations are in the digital space, with my having been a digital publisher since 1989. We’re always mindful that our activities promote a healthy online space, which means we keep a watchful eye on the behaviour of US Big Tech. (For instance, we removed all Facebook gadgets from our sites in 2018, prior to the Cambridge Analytica exposé, as we became increasingly concerned of the tracking exposure our readers were getting.)

Our internal search is now run by Mojeek, a UK-based search engine that has the largest index in the west outside of Google. It is also my default, having lost faith in Duck Duck Go after 12 years.

Other than the Warehouse’s home page, none of the contents of your company’s site appear in Mojeek. When I raised this with them, they tell me that Mojeek is very specifically blocked by the Warehouse. Neither they nor I can see any good reason a legitimate, independent search engine would be blocked.

I am told that inside your code is:
 
User-agent: MojeekBot
Disallow: /

 

As concerns over privacy grow, it seems a disservice that it’s blocked.

When I put this to other techs, they theorize that the Warehouse wants to track people via whatever data Google provides. I find this hard to believe. To what end? The amount of information that comes surely can’t outweigh overall accessibility to the website for those of us who have concerns over Google’s monopolistic behaviour and privacy intrusions.

Even if tracking were the reason, I would have thought there would be no great loss allowing a tiny percentage of people to come in via a Mojeek search result and browse the site—including customers like me who had the intent to see what you had in stock with a view to purchasing the item.

I genuinely hope this is something that will be looked into and that a New Zealand company I admire (one which is connected to me through a round-about way—I was educated by relatives of the Tindalls) isn’t party to upholding the Google monopoly.

Tags: , , , , , , , ,
Posted in business, internet, New Zealand, technology | No Comments »


Forget the 2010s and 2020s, Bing’s results are firmly in the 2000s now

09.10.2022

Immediately after blogging about Bing being able to pick up an article from 2022, Microsoft’s collapsing search engine has reverted back to being the Wayback Machine. There was just over a week of it living in the 2020s, but it seems it’s too much for them.

It’s back to, well, Bing Vista, for want of a better term. Of the 50 results (out of a claimed 120!) that it’s capable of returning for site:lucire.com, here is how it breaks down based on the publication year of the article. Since my last test, Bing has eliminated the 2018 and 2019 results (one page per year). We wouldn’t want to think it could deliver anything from the last decade, would we?
 
Bing
Contents’ pages ★★
1997
1998
1999
2000
2001 ★★★★★
2002
2003 ★★★★
2004 ★★★★
2005 ★★
2006 ★
2007 ★★★★★★★
2008 ★★
2009 ★★
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
 

There were 29 unique results, which means 21 were repeats—42 per cent! Bing says it had 120 results but really only had 29. To fill up the 50 it had to show 21 results multiple times!

Let’s see how Google fared for the first 50 results.
 
Google
Contents’ pages ★★★★★★★★★★
1997
1998
1999
2000
2001
2002 ★★
2003
2004 ★★
2005 ★
2006
2007 ★
2008
2009 ★
2010 ★★
2011 ★★★
2012 ★★
2013 ★★
2014 ★★★
2015 ★★
2016 ★★
2017 ★★
2018 ★★
2019 ★★★
2020 ★★★
2021 ★★
2022 ★★★★★
 
Google has moved again since we began looking at things. In an earlier test tonight, Google had two repeat results, which was a surprise. But I wasn’t able to replicate it when I did the one for the blog post.

No such issues at Mojeek, where every entry is unique. They really are more capable of delivering search engine results for site searches that are superior to the other two’s.
 
Mojeek
Contents’ pages ★★★★★★★★
1997
1998
1999
2000
2001
2002
2003
2004 ★
2005
2006
2007
2008
2009 ★
2010 ★★
2011 ★★
2012 ★★★
2013 ★★★★★
2014 ★★★
2015 ★★★★★
2016
2017
2018
2019 ★★★★
2020 ★★
2021 ★★★★★★★★★
2022 ★★★★★★
 
An improvement on our September 21 test, where Mojeek has managed to capture more 2020s pages as part of its top 50.

I won’t run the other search engines through this—I just wanted two points of comparison to highlight how ridiculous Bing remains, with the resultant effect on web traffic. It means Duck Duck Go, Qwant, Ecosia, Yahoo and others, which are also Bing, are just as compromised.

I might lay off them for a while as we know it’s crap and things aren’t going to change. Microsoft has firmly entrenched itself as a bunch of liars, like their other Big Tech counterparts.

Tags: , , , , , , , , , ,
Posted in internet, technology, USA | No Comments »


Startpage isn’t what I thought it was—but then Google does the opposite to what you think

05.10.2022

Startpage says it licenses Google’s results but gives us privacy. So, if you want Google-level, Google-biased results, but don’t want their tracking, you use Startpage.

Um, no. Let’s just take a random search for a screenwriter I once mentioned on this blog:
 


 

It’s quite a bit slower than Google, too. The results are usually geographically biased, even when you have the region switched off.

What’s curious is that, at the same location with the same IP address, I get six Google results on desktop and 16 on mobile. I’m not sure what the sense is in that.
 


 

I realize there are a lot of mobile users, but it seems strange to limit what can be found on the desktop version. Surely the opposite would make sense since not all sites are mobile-optimized?

It’s like Google Maps: for me, it’s not accessible on a cellphone any more (and hasn’t been for months—I discovered this when Amanda and I went on holiday at the end of August and there was no Google Maps anywhere in the country) but remains available on a desktop. The geniuses at Google do realize that people are more likely visiting Maps on a phone than sitting in their offices, right?

It doesn’t matter where I try, even from the office network: Google Maps is not available on my phone. The site is not just unavailable, it doesn’t even resolve (whether you use maps.google.com or google.com/maps).
 

 

Usually I find that expecting the opposite of what US Big Tech says is really useful.

Better use paper maps, because the satellites are often switched off and the map programs on your phone think you are nowhere!
 

 

Coming back to the original topic, Startpage says it pays Google for this.

Better ask for a refund, folks.

Tags: , , , , , , ,
Posted in internet, New Zealand, technology | No Comments »


Testing the search engines: Bing likes antiquity; most favour HTML over PHP

21.09.2022

Bing is spidering new pages, as long as they’re very, very old.

Last week, we added a handful of Lucire pages from 1998 and 1999. An explanation is given here. And I’ve spotted at least two of those among Bing’s results when I do a site:lucire.com search.

As a couple of newer pages have also shown up, I doubt there’s any issue with the template; and the home page now also appears, too. But, by and large, Bing is Microsoft’s own Wayback Machine, and most of the Lucire results are from the 1990s and early 2000s.

It got me thinking: do the other search engines do this, too? For years, Google grandfathered older pages and they came up earlier. (Meanwhile, searches for my own name still have this site, and the company site, down, having lost first and second when we switched from HTTP to HTTPS in March. Contrary to expert opinion, you don’t recover, at least not quickly.)

As Lucire includes the date of the article in the URL, this should be an easy investigation. We’ll only do the first 50 results as that’s all Bing’s capable of. I’ll try not to include any repeat results out of fairness. ‘Contents’ pages’ include the home page, the Lucire TV and Lucire print shopping pages, and tag and category pages.
 
Bing
Contents’ pages ★★★
1997
1998
1999 ★★★★
2000 ★
2001 ★★★★★★★★
2002 ★★
2003 ★★★
2004 ★★★★
2005 ★★
2006
2007 ★★★
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018 ★
2019 ★
2020
2021
2022
 
Google
Contents’ pages ★★★★★★★★★★★★★
1997
1998
1999
2000
2001
2002 ★★
2003
2004 ★★
2005
2006
2007 ★
2008
2009
2010 ★
2011 ★★★
2012 ★
2013 ★★
2014 ★★★
2015 ★
2016 ★★
2017 ★
2018 ★★★
2019 ★★★
2020 ★★★★★★★
2021 ★
2022 ★★★★
 
Mojeek
Contents’ pages ★★★★★★
1997
1998
1999
2000
2001
2002
2003
2004 ★
2005
2006
2007
2008
2009 ★
2010 ★★
2011 ★★
2012 ★★★
2013 ★★★★
2014 ★★★
2015 ★★★★★
2016 ★★★★★★★
2017 ★★★★★★
2018 ★★★
2019 ★★★★
2020 ★★★
2021
2022
 
Baidu
Contents’ pages ★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018 ★
2019 ★
2020
2021 ★★★
2022 ★
 
Yandex
Contents’ pages ★★★★★
1997
1998
1999 ★★★★★
2000 ★★★★★★
2001 ★★★
2002 ★★★
2003 ★★★
2004 ★
2005
2006
2007 ★★★★
2008 ★★
2009 ★★
2010 ★★★★
2011 ★★★
2012 ★★
2013 ★
2014 ★★
2015
2016
2017
2018
2019
2020 ★★★
2021 ★
2022
 

To me, that was fascinating. My instincts weren’t wrong with Bing: it’s old and it favours the old (two of the restored articles were indexed). From the first 50 results, 18 results were repeats—that’s 36 per cent. I’m of the mind that Bing is so shot that it can only index old pages that don’t take up much space. New ones have a lot more data to them, generally.

Google does a good job with the top-level and second-level contents’ pages, though there were a few strange tag indices. But the distribution is what you’d expect: people would search for more recent stories. I know we had some popular stories from 2002 that still get hit a lot.

Mojeek has a similar distribution, though it should be noted that you can’t do a blanket site: search. There must be a keyword, and in this case it’s Lucire. The 2016 pages form the mode, which I don’t have a huge problem with; it’s better than the 2001 pages, which Bing has over everything else.

Baidu’s one is crazy as individual stories are seldom spat out in the first five pages, the search engine preferring tag indices, though half a dozen later story pages do make it into its top 50.

Finally, Yandex leans toward older pages, too, including our most popular 2002 piece. It’s the 2000 stories it has the most of among the top 50, and there’s a strange empty period between 2015 and 2019. But at least there is a fairer distribution than Bing can muster.

The other query that I had was whether these search engines were biasing their results toward HTML pages, rather than PHP ones. If that’s the case, then it could explain Bing’s preference for the old stuff (Lucire didn’t have PHP pages till 2008; prior to that it was all laboriously hand-coded, albeit within templates.)
 
Bing
★★★★★★★★★★★★★★★★★★★★★★★★★ HTML
★ PHP
 
Google
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★ HTML
★★★★★★★★★ PHP
 
Mojeek
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★ HTML
★★★★★★★★★★★★★★★★★ PHP
 
Baidu
★★★★★★★★★★ HTML
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★ PHP
 
Yandex
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★ HTML
★★★★★★ PHP
 

I think we can safely say there’s a preference for HTML over PHP. Mojeek brings up a lot of HTML pages after the top 50, even though this sample shows the split isn’t as severe.

Our PHP pages are less significant though: they contain news stories, and these are often ones other media covered, too. But I would have thought some of the more popular stories would have made the cut, and here it’s Mojeek’s distribution that looks superior to the others’. It seems like it’s actually analysing the page content’s text, which is what you want a search engine to do.

Baidu’s PHP-heaviness is down to all the tag indices—rendering it not particularly helpful as a search engine.

On these two tests, Mojeek and Google rank best, and Yandex comes in third. Baidu and Bing are a distant fourth and fifth.

Tags: , , , , , , , , , , ,
Posted in China, culture, internet, media, publishing, technology, UK, USA | No Comments »


Bing hates novelty—it’s really Microsoft’s Wayback Machine

27.08.2022

Bing is still very clearly near death, as this latest site: search shows.
 

 

It manages a grand total of 10 pages from Lucire, and as outlined before, some are pages that have not been linked to for 17 years.

I purposely updated some of the pages Bing had in its limited capacity, and strangely, those have disappeared! Bing doesn’t want anything new, as it appears to be Microsoft’s Wayback Machine.

The fifth result here is a case in point. Some of you may recall lucire.com/about.shtml appearing in all the search engines, including Bing. This is a page last updated in 2004, with some final tweaks in 2012 (I assume for ad code; I don’t recall). It was a page that I decided I would stick on to a new template, since the search engines loved it so much. I copied the text from our licensing site. And, for the sake of online archæology, I put the 2004 page exactly as it was into a file called about-2004.shtml.

Bing must still be alive enough to spider and index the renamed page, but it rejects the revised about.shtml!

It’s similar to what I wrote in mid-August when I updated other ancient pages from the early 2000s: Bing rejected them, including a frameset that now pointed at the latest page!

You may be thinking: obviously, you are doing something wrong with your newer code, Jack, for Bing to favour the old stuff. But look at the fourth result: it’s from 2020, the one “new” page that Bing has managed to index and show. I don’t think we have anything wrong with our code if this page has made it in.

Google happily included the new about.shtml.

A search for Lucire itself on Bing now does include the home page, which is a new development in a search engine that’s limping along. So much for the earlier claim that there were issues with the page that prevented it from appearing.

Tags: , , , , , , , , , , , ,
Posted in internet, media, publishing, technology, USA | No Comments »


Testing the seven search engines in the world

22.08.2022

After reading Mojeek’s blog post from last July, I learned there are only seven search engines in the world now. In other words, I was checking more search engines out in the 1990s. It’s rather depressing, especially as the search market is largely a monopoly with Google dominating it (and all the ills that brings), and Bing and its licensees (like Duck Duck Go) with their 6 per cent.

Knowing there are seven, I fed the site:lucire.com search into all of them to see where each stood.

The first figure is the claimed number of results, the second the actual number shown (without repeats removed, which Bing is guilty of).

I can’t use Brave here as its site search is Bing as well.

Yandex appears to be capped at 250 and Mojeek at 1,000, but at least they aren’t arbitrary like Google and Baidu. Baidu has a lot of category and tag pages from the Wordpress section of our site to bump up the numbers.
 
Gigablast 0/0
Sogou 19/13
Bing 243/50
Baidu 13,700/213
Yandex 2,000/250
Google 6,280/315
Mojeek 3,654/1,000
 

Frankly, more of us should go to Mojeek. It can only get better with a wider user base. Unlike Bing, it hasn’t collapsed. I know most of you will keep going to Google, but I just don’t like the look of those limits (not to mention the massive privacy issues).

Mojeek is now at 5,900 million pages, which must be the largest index in the west outside of Google.

Tags: , , , , , , , , , , , , , , , , ,
Posted in China, internet, publishing, technology, UK, USA | No Comments »


False accusations from Red Points Solution SL

18.08.2022

Yesterday, I returned to find a DMCA claim filed against us by Red Points Solution SL, purporting to act for Harper’s Bazaar España publisher Hearst Magazines SL, falsely accusing us of breaching their copyright with this article. You can read the notice here.

Naturally, I filed a counter-claim because their accusation is baseless.

Our source was PR Newswire, and it’s not uncommon to find stories of interest through that platform. In fact, Armani Beauty was so keen to get this out there on November 3 that we received the release in four languages at 15.28, 15.30, 15.33, 15.36, 15.39, 15.46 and 16.03 UTC.

The quotations and images were supplied by Armani Beauty, which is part of L’Oréal. I’ve worked with people from L’Oréal for over two decades and know their systems well enough, including the money they have for licensing images for press usage.

Lucire has a lot of original articles, but some of our news is release-based, as it is for anyone in our industry.

Our rule is: even when it’s a release, you write it up individually in your own words. You may have something additional to bring to the story. And we aren’t a repository of releases.

The only time we would run a release mostly verbatim is if we issued it, something that might happen once every couple of years.

Naturally, Google has so far done nothing and our story remains absent from their index. Big Tech loves big firms like Hearst.

I’ve tagged Harper’s Bazaar España in social media demanding they front up with their evidence. I’ve also messaged Hearst’s Spanish office with the following.

Ladies and Gentlemen:
 
Yesterday, your firm lobbed a false accusation against us by deceptively claiming your copyright had been breached by one of our articles. I note that you filed this as a DMCA complaint with Google.

We have filed a counter-notice.

We find it appalling that you would claim an original work has breached your copyright.

The imagery and quotations to our articles were sourced from L’Oréal, and we have informed them directly of your deceptive and misleading conduct.

I demand you furnish proof. As you will no doubt fail to, we demand you withdraw the complaint. We reserve the right to pursue our own legal remedies against you.
 
Yours faithfully,
 
Jack Yan
Publisher, Lucire

I basically thought they were being dicks and my friend Oliver Woods chimed in on Twitter about it. Oli’s very insightful and objective, and I respect his opinion.

They are being dicks, but there is a strategy behind it. Petty little minds wanting to look good on Google, not liking someone else ahead of them. (Not that I ever looked to see where our story ranked. I mean, seriously?)

It reminds me of a US designer’s rep who emailed me a while back wanting us to remove an article.

I asked: what’s wrong with it? Did we err in facts? Is it somehow defamatory?

When I probed a bit more deeply, it turned out that they were incensed it came up so highly in a Google image search.

I explained that that wasn’t a good enough reason, especially since the story had been provided to us by a PR firm.

They countered by saying that as they had not heard of us, it was highly unlikely that they would have released us that news.

I thought it was a very strange strategy to accuse someone you wanted a favour from of lying.

I still have the email from their PR firm. Call me Lord of the Files.

I’m not going to reveal the identity of the designer. I asked one of my team to see if he would call me directly instead of having one of his rude staff insult me. He never did call. The image is still there, and I bet they’re seething each time they see it.

It’s not even a bad image. It just doesn’t happen to be hosted by them.

I don’t really know why search engine domination is so important. We all should have a fair crack at it, and let whomever has the most meritorious item on a particular topic come up top.

The American designer, and the Spanish outpost of this American media giant Hearst, are obviously not people who like freedom of the press, freedom of expression, or a meritorious web. American people might like this stuff but a lot of their corporations don’t.

Which is why Google is terrible because it doesn’t allow it. We know through numerous lawsuits it has biases toward its own properties, for a start. I’ve observed them favouring big media brands over independents—even when independents break a news story.

Mojeek is just so, so much better. No agenda. Just search the way it was and should have stayed. That’s the “next Google”, the one that could save the web, that I had asked for in 2010.

Except it shouldn’t be the next Google because we don’t want more surveillance and tribalism.

Fair, unbiased search is where Mojeek excels. I really hope it catches on more. God knows the world needs it.

I think the world needs Lucire, too, the title that Harper’s Bazaar Australia named as part of its ‘A-list of style’. The Aussies are just so much nicer.
 
PS.: Hearst uses a company called Red Points Solution SL to do its supposed copyright infringement detection. Based on this, they must be pretty shit at it. And remember, we don’t even publish in Spanish. Yet.

I see you have falsely accused us of copyright infringement with our article at https://lucire.com/insider/20211103/valentina-sampaio-named-armani-beautys-newest-ambassador/ when we have done nothing of the sort.

We demand that you withdraw your DMCA complaint to Google.
 
https://lumendatabase.org/notices/28469986#
 

Our story’s source is Armani Beauty through PR Newswire, to which we are signed up as a legitimate international media organization. The story is our work, using facts and quotations provided in the release.

PR Newswire provided us with this release on November 3, 2021, at 15.28, 15.30, 15.33, 15.36, 15.39, 15.46 and 16.03.

A counter-notice has been filed.

We require an explanation from you on why you have targeted a legitimate media organization with your deception. Clearly your detection systems are not very good and we would certainly be discouraged from using them.

 
P.PS.: One more email to Red Points Solution SL on August 19, 21.56 UTC after they doubled-down with another notice removing two URLs from Google. Again, no proof of their original work was provided, and none can be seen in Lumen even when requested. It seems Google will lap anything up if it sees a big company behind it.

I have reached out to you through numerous means but yet to hear back.

I publish Lucire, a magazine with a 25-year history and five editions worldwide. You might even say we’re the sort of business that would need Red Points Solution’s services.

However, we’ve found ourselves at the other end, with legitimate media stories from our website removed from Google with DMCA notices you’ve filed.

Your client is Hearst SL.

If your latest efforts are down to Hearst’s orders, then they are claiming ownership over material that is not theirs.

All our content is original, and where it is not, it is properly licensed.

In the first case:
 
https://lucire.com/insider/20211103/valentina-sampaio-named-armani-beautys-newest-ambassador/
 

Your client does not own this material at all. We own the story, and the quotations and images are owned by and licensed to us by L’Oréal. Hearst has no connection to it other than Harper’s Bazaar being mentioned in an editorial fashion.

In the second case:
 
https://lucire.com/insider/20190905/nicky-hilton-hosts-brunch-to-celebrate-her-collaboration-with-french-sole/
 

Your client does not own this material at all. We own the story, and the images are owned by and licensed to us by French Sole and BFA.com. Hearst has no connection to it other than Harper’s Bazaar being mentioned in an editorial fashion.

In the third case:
 
https://lucire.com/insider/page/164/?mobiinsider%2F20120130%2Felizabeth-olsen-models-asos-magazines-cover%2F%3Fwpmp_switcher=mobile
 

Your client does not own this material at all. In fact, we own this material fully. No Hearst properties are even mentioned.

Counter-notifications have been filed on the basis that it is our original content and that your client has no right to make the claim in the first place.

It would be far easier if you would review your systems as presently they are opening your client and yourselves up to a legal claim …

We think you need to go back to your client and have them show you just how they can legitimately claim ownership of material that is not theirs.

In the meantime, we insist you stop these notices as they are unwarranted and unfounded.

We look forward to hearing from you.

Tags: , , , , , , , , , , , , , , , , , ,
Posted in business, internet, media, New Zealand, publishing, USA | No Comments »


More of Bing’s follies (they just keep coming)

16.08.2022

I see WorldWideWebSize.com has wised up and figured out Bing was having them on about the number of results it had for their search terms.
 

 

When Bing says it has 300-odd results for the site:lucire.com yet doesn’t actually go beyond a limit of around 50 (where it has been stuck for many months), I was actually being generous. I never deducted the repeated results on the pages that it did show.

Here’s a case in point: an ego search for my own name. These are the first four pages. I realize I have the graphics a bit small, but you should be able to make out just how many pages have been repeated here. A regular search engine like Mojeek and Google show you different results on each page. Bing doesn’t.
 




 

More strange happenings: you’ll recall I noted that pages we haven’t linked to since the 2000s were up top in a site search on Bing for lucire.com. The very top one was lp.html, a frameset (yes, it’s that old). I did what I thought would be logical in such a circumstance: I pointed one of the frames to the current 2022 page (which is still regular HTML, but with Bootstrap).

Result in Bing: it’s vanished.

Did the same to news.html, not linked to since 2012.

Vanished.
 

 

The current news page is Wordpress, but Bing still manages to index the occasional Wordpress page on our site. The fact it’s PHP shouldn’t make a difference.

These pages are just too new for Bing, which is really Microsoft’s own Wayback Machine. And Duck Duck Go’s, and Qwant’s, and a whole manner of search engines’.
 
Meanwhile at Brave: it does have an independent spider but admits to using the Bing API for the image search, as does Mojeek. But what Brave doesn’t say is that it also taps in to Bing for site: searches, rendering them largely useless, too. Brave does a far better job than Bing in its regular search though, picking up lucire.com for Lucire as well as some major index pages.
 

On a regular search, Brave does rather well—it’s picked up the top pages.
 


Bing and Brave compared, using site:lucire.com. Brave isn’t as independent as you might think with site: and image searches. These screenshots were taken on Sunday.
 

Still well short of Mojeek in terms of its index—but then so is everyone aside from Google.

The saga continues, with still no one talking about Bing’s collapse (though I know of one journalist working away behind the scenes).

Tags: , , , , , , , , , ,
Posted in branding, business, internet, technology, USA | No Comments »


Updating old pages since the experts are wrong

12.08.2022

With all the odd results coming up in site searches—it’s not restricted to Bing—I attended to some of the older pages on our websites.

Curiously, in a site:lucire.com search, even Google has our 2005 competition page up high, namely in fifth. There is only one link from our site internally to this page. I know of none externally. The idea about Backrub and “link juice” doesn’t ring true here as there is no way that page should be ranked so highly.
 


Top: Google has our 2005 competition page ranked very highly despite it being a redirect. Above: Internally, only one file refers to it, dating from the 2000s.
 

Not only that, it’s a page that refreshes to another on the site—so much for these being lowly ranked and that search engines don’t like them.

Nevertheless, as it’s not relevant or useful any more, I deleted it (though it remains in Google at the time of writing).

The ‘About’ page I’ve discussed before and it remains in fourth, despite not being linked from anywhere recent on our site. It was updated with text from our licensing website and now also follows the rest of the site—though we haven’t bothered making any new links to it. It’s really just for the search engines. (For nostalgia’s sake, it has a link to the 2004 page that the search engines love so much.)

We had so many frameset pages on the Lucire site that I updated a few of those, though—rightly or wrongly—I left the frames intact. Well, if they rank so highly, contrary to what the experts all say, then why not?

The one that had the most surgery, however, was jyanet.com/lucire, Lucire’s original URL in 1997. That still comes up in 23rd for me in Google (for the search Lucire), and 20th in Startpage. This hasn’t been linked to since 1998 by us, and I doubt very many outside of our company would. It was our home only for about six months after launch.

Given its enduring popularity, we’ve given it a Bootstrap template and it shares a stylesheet with the rest of the Lucire site, despite it being at another domain. It now contains links to other Lucire sites, which seems a fitting “gift” to the page as we celebrate our 25th anniversary.
 

Tags: , , , , , , , , , ,
Posted in business, internet, New Zealand, publishing, technology, USA | No Comments »


What search engines show in their top 10 isn’t always relevant

09.08.2022

The Bing collapse did lead me to look at some of the ancient pages on the Lucire site that the search engines were still very fond of. For instance, the ‘About’ page was still appearing up top, which is bizarre since we haven’t made any links to it for years—it reflected our history in 2004.

Naturally, once I updated it, it promptly disappeared from Bing! Too new for Microsoft’s own Wayback Machine!

I was always told that you shouldn’t delete old pages, and that 301s were the best solution. I’m enough of a computing neophyte to not know how to implement 301s (.htaccess doesn’t work, at least not on our set-up) and page refreshes are often frowned upon, which is why so many old pages are still there.

However, you would naturally expect that a web spider following links would not rank anything that hasn’t been linked to for over a decade very highly. If the spider comes in, picks up the latest stuff from your home page, possibly the latest stuff from individual topic pages, it would figure out what all of these were linking to, and conclude that something from 2000 that was buried deep within the site was no longer current, or of only passing interest to surfers.

I realize I’ve had a go at search engines for burying relevant things in favour of novel things, but we’re talking pages here that aren’t even relevant. ‘About’ I’ll let them have, but a 2000 book reviews’ page? A subject index page from 2005 that hasn’t been linked to since 2005, and the pages that do are well outnumbered by newer ones? Because, the deletion of ‘About’ aside, here is what Bing thinks is the most important for site:lucire.com:
 

 

Google fares a little better. Our home page and current print edition ordering page are top, shopping is third, followed by the fashion contents’ page (makes sense). ‘About’ comes in fifth, for whatever reason, then a 2005 competition page that we should probably delete (it refreshes to another page from 2005—so much for refresh pages being bad for search engines).

Seventh is yet another ancient page from 2005, namely a frameset—which I’ve since updated so at least the main frame loads something current. The remainder are articles from 2011, 2022 and 2016. The next page comprises articles and tags, which seem to make sense.

Mojeek actually makes more sense than Google. Home page in first, the news page (the next most-updated) is second, followed by the travel contents’ page. Then there are two older print edition pages (2020 and 2012), followed by a bunch of articles (2013, 2014, 2013, 2013), and the directory page for Lucire TV. There’s nothing here that I find strange: everything is logically found by a spider going through the site, and maybe those four articles from the 2010s are relevant to the word Lucire (given that you can’t do site: searches on Mojeek without a keyword, so it repeats the word before the TLD)? The reference to the 2012 issue might be down to my having mentioned it recently during our 25th anniversary posts. But there are no refresh pages and no framesets.

Startpage, not Google, has a couple of frameset pages from 2000 and 2002 in their top 10 which again weren’t linked to, at least not purposefully (they were placed there to catch people trying to look at the directory index in the old days). There’s incredibly little “link juice” to these pages. However, ‘About’ (in 10th), and these two framesets aside, its Google-sourced results fare remarkably well. In order: home page, print edition ordering page, the two framesets, the news section, the shopping page (barely updated but I can see why it’s there), the community page, Lucire TV, the fashion contents, ‘About’.

Duck Duck Go is so compromised by Bing that it barely merits a mention here. Four pages from 2000 and 2005 that no current page links, a 404 page that we’ve never even had on our site (!), articles from 2021, 2018, 2007 and 2000 (in that order), and a PDF (!) from 2004. Fancy having a 404 that never even existed in the top 10!

If I had my way, it’d be home page, followed by the different sections’ contents’ pages, then the most popular article—though if a couple of articles go (or went) viral, then I’d expect them sooner.

Both Mojeek and Google do well here, with four of these pages each in their top 10s. But it’s Startpage’s unfiltered Google results that do best, hitting linked, relevant pages in seven results out of the top 10. Bing and its licensees miss the mark completely. If you must have a Google bias, then Startpage is the way to go; for our purposes, Mojeek remains the better option.
 
★★★★★★★☆☆☆ Startpage
★★★★☆☆☆☆☆☆ Mojeek
★★★★☆☆☆☆☆☆ Google
★★☆☆☆☆☆☆☆☆ Virtual Mirage
★☆☆☆☆☆☆☆☆☆ Baidu
★☆☆☆☆☆☆☆☆☆ Yandex
☆☆☆☆☆☆☆☆☆☆ Bing
☆☆☆☆☆☆☆☆☆☆ Qwant
☆☆☆☆☆☆☆☆☆☆ Swisscows
☆☆☆☆☆☆☆☆☆☆ Brave
☆☆☆☆☆☆☆☆☆☆ Duck Duck Go (would give –1 for the 404 if I could)

Tags: , , , , , , , , , , , ,
Posted in France, internet, New Zealand, publishing, technology, UK, USA | No Comments »