Posts tagged ‘Baidu’


Testing the search engines: Bing likes antiquity; most favour HTML over PHP

21.09.2022

Bing is spidering new pages, as long as they’re very, very old.

Last week, we added a handful of Lucire pages from 1998 and 1999. An explanation is given here. And I’ve spotted at least two of those among Bing’s results when I do a site:lucire.com search.

As a couple of newer pages have also shown up, I doubt there’s any issue with the template; and the home page now also appears, too. But, by and large, Bing is Microsoft’s own Wayback Machine, and most of the Lucire results are from the 1990s and early 2000s.

It got me thinking: do the other search engines do this, too? For years, Google grandfathered older pages and they came up earlier. (Meanwhile, searches for my own name still have this site, and the company site, down, having lost first and second when we switched from HTTP to HTTPS in March. Contrary to expert opinion, you don’t recover, at least not quickly.)

As Lucire includes the date of the article in the URL, this should be an easy investigation. We’ll only do the first 50 results as that’s all Bing’s capable of. I’ll try not to include any repeat results out of fairness. ‘Contents’ pages’ include the home page, the Lucire TV and Lucire print shopping pages, and tag and category pages.
 
Bing
Contents’ pages ★★★
1997
1998
1999 ★★★★
2000 ★
2001 ★★★★★★★★
2002 ★★
2003 ★★★
2004 ★★★★
2005 ★★
2006
2007 ★★★
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018 ★
2019 ★
2020
2021
2022
 
Google
Contents’ pages ★★★★★★★★★★★★★
1997
1998
1999
2000
2001
2002 ★★
2003
2004 ★★
2005
2006
2007 ★
2008
2009
2010 ★
2011 ★★★
2012 ★
2013 ★★
2014 ★★★
2015 ★
2016 ★★
2017 ★
2018 ★★★
2019 ★★★
2020 ★★★★★★★
2021 ★
2022 ★★★★
 
Mojeek
Contents’ pages ★★★★★★
1997
1998
1999
2000
2001
2002
2003
2004 ★
2005
2006
2007
2008
2009 ★
2010 ★★
2011 ★★
2012 ★★★
2013 ★★★★
2014 ★★★
2015 ★★★★★
2016 ★★★★★★★
2017 ★★★★★★
2018 ★★★
2019 ★★★★
2020 ★★★
2021
2022
 
Baidu
Contents’ pages ★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018 ★
2019 ★
2020
2021 ★★★
2022 ★
 
Yandex
Contents’ pages ★★★★★
1997
1998
1999 ★★★★★
2000 ★★★★★★
2001 ★★★
2002 ★★★
2003 ★★★
2004 ★
2005
2006
2007 ★★★★
2008 ★★
2009 ★★
2010 ★★★★
2011 ★★★
2012 ★★
2013 ★
2014 ★★
2015
2016
2017
2018
2019
2020 ★★★
2021 ★
2022
 

To me, that was fascinating. My instincts weren’t wrong with Bing: it’s old and it favours the old (two of the restored articles were indexed). From the first 50 results, 18 results were repeats—that’s 36 per cent. I’m of the mind that Bing is so shot that it can only index old pages that don’t take up much space. New ones have a lot more data to them, generally.

Google does a good job with the top-level and second-level contents’ pages, though there were a few strange tag indices. But the distribution is what you’d expect: people would search for more recent stories. I know we had some popular stories from 2002 that still get hit a lot.

Mojeek has a similar distribution, though it should be noted that you can’t do a blanket site: search. There must be a keyword, and in this case it’s Lucire. The 2016 pages form the mode, which I don’t have a huge problem with; it’s better than the 2001 pages, which Bing has over everything else.

Baidu’s one is crazy as individual stories are seldom spat out in the first five pages, the search engine preferring tag indices, though half a dozen later story pages do make it into its top 50.

Finally, Yandex leans toward older pages, too, including our most popular 2002 piece. It’s the 2000 stories it has the most of among the top 50, and there’s a strange empty period between 2015 and 2019. But at least there is a fairer distribution than Bing can muster.

The other query that I had was whether these search engines were biasing their results toward HTML pages, rather than PHP ones. If that’s the case, then it could explain Bing’s preference for the old stuff (Lucire didn’t have PHP pages till 2008; prior to that it was all laboriously hand-coded, albeit within templates.)
 
Bing
★★★★★★★★★★★★★★★★★★★★★★★★★ HTML
★ PHP
 
Google
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★ HTML
★★★★★★★★★ PHP
 
Mojeek
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★ HTML
★★★★★★★★★★★★★★★★★ PHP
 
Baidu
★★★★★★★★★★ HTML
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★ PHP
 
Yandex
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★ HTML
★★★★★★ PHP
 

I think we can safely say there’s a preference for HTML over PHP. Mojeek brings up a lot of HTML pages after the top 50, even though this sample shows the split isn’t as severe.

Our PHP pages are less significant though: they contain news stories, and these are often ones other media covered, too. But I would have thought some of the more popular stories would have made the cut, and here it’s Mojeek’s distribution that looks superior to the others’. It seems like it’s actually analysing the page content’s text, which is what you want a search engine to do.

Baidu’s PHP-heaviness is down to all the tag indices—rendering it not particularly helpful as a search engine.

On these two tests, Mojeek and Google rank best, and Yandex comes in third. Baidu and Bing are a distant fourth and fifth.

Tags: , , , , , , , , , , ,
Posted in China, culture, internet, media, publishing, technology, UK, USA | No Comments »


Testing the seven search engines in the world

22.08.2022

After reading Mojeek’s blog post from last July, I learned there are only seven search engines in the world now. In other words, I was checking more search engines out in the 1990s. It’s rather depressing, especially as the search market is largely a monopoly with Google dominating it (and all the ills that brings), and Bing and its licensees (like Duck Duck Go) with their 6 per cent.

Knowing there are seven, I fed the site:lucire.com search into all of them to see where each stood.

The first figure is the claimed number of results, the second the actual number shown (without repeats removed, which Bing is guilty of).

I can’t use Brave here as its site search is Bing as well.

Yandex appears to be capped at 250 and Mojeek at 1,000, but at least they aren’t arbitrary like Google and Baidu. Baidu has a lot of category and tag pages from the Wordpress section of our site to bump up the numbers.
 
Gigablast 0/0
Sogou 19/13
Bing 243/50
Baidu 13,700/213
Yandex 2,000/250
Google 6,280/315
Mojeek 3,654/1,000
 

Frankly, more of us should go to Mojeek. It can only get better with a wider user base. Unlike Bing, it hasn’t collapsed. I know most of you will keep going to Google, but I just don’t like the look of those limits (not to mention the massive privacy issues).

Mojeek is now at 5,900 million pages, which must be the largest index in the west outside of Google.

Tags: , , , , , , , , , , , , , , , , ,
Posted in China, internet, publishing, technology, UK, USA | No Comments »


Putting the search engines through their paces

24.07.2022

One more, and I might give the subject a rest. Here I test the search engines for the term Lucire. This paints quite a different picture.

Lucire is an established site, dating from 1997, indexed by all major search engines from the start. The word did not exist online till the site began. It does exist in old Romanian. There is a (not oft-used) Spanish conjugated verb, I believe, spelt the same.

The original site is very well linked online, as you might expect after 25 years. You would normally expect, given its age and the inbound links, to see lucire.com at the top of any index.

There is a Dr Yolande Lucire in Australia whom I know, who I’m used to seeing in the search engine results.

The scores are simply for getting relevant sites to us into the top 10, and no judgement is made about their quality or relevance.
 
Google
lucire.com
twitter.com
lucire.net
instagram.com
wikipedia.org
linkedin.com
facebook.com
pinterest.nz
neighbourly.co.nz
—I hate to say it, as someone who dislikes Google, but all of the top 10 results are relevant. Fair play. Then again, with the milliards it has, and with this as its original product, it should do well. 10/10
 
Mojeek
scopalto.com
lucirerouge.com
lucire.net
lucire.com
mujerhoy.com
portalfeminino.com
paperblog.com
dailymotion.com
eldiablovistedezara.net
hispanaglobal.com
Mojeek might be flavour of the month for me, but these results are disappointing. Scopalto retails Lucire in France, so that’s fair enough, but disappointing to see the original lucire.com site in fourth. Fifth, sixth, seventh, ninth and tenth are irrelevant and relate to the Spanish word lucir. You’d have to get to no. 25 to see Lucire again, for Yola’s website. Then it’s more lucir results till no. 52, the personal website of one of our editors. 5/10
 
Swisscows
lucire.net
wikipedia.org
lucire.com
spanishdict.com
lucire.net
lucire.com
drlucire.com
facebook.com
spanishdict.com
viyeshierelucre.com
—Considering it sources from Bing, it makes the same mistakes by placing the rarely linked lucire.net up top, and lucire.com in third. Fourth, ninth and tenth are irrelevant, and the last two relate to different words. Yola’s site is seventh, which is fair enough. 6/10
 
Baidu
lucire.net
lucire.com
lucire.cc
lucire.com
kanguowai.com
hhlink.com
vocapp.com
forvo.com
kuwo.cn
lucirehome.com
—Interesting mixture here. Strange, too, that lucire.net comes up top. We own lucire.cc but it’s now a forwarding domain (it was once our link shortener, up to a decade ago). Seventh and ninth relate to the Romanian word strălucire and eighth to the Romanian word lucire. The tenth domain is an old one, succeeded a couple of years ago by lucirerouge.com. Not very current, then. 7/10
 
Startpage
lucire.com
lucire.com
lucire.net
instagram.com
wikipedia.org
linkedin.com
facebook.com
pinterest.nz
fashionmodeldirectory.com
twitter.com
—All relevant, as expected, since it’s all sourced from Google. 10/10
 
Virtual Mirage
lucire.com
instagram.com
wikipedia.org
lucire.net
facebook.com
linkedin.com
pinterest.nz
lucirerouge.com
nih.gov
twitter.com
—I don’t know much about this search engine, since I only heard about it from Holly Jahangiri earlier today. A very good effort, with only the ninth one being irrelevant to us: it’s a paper co-written by Yola. 9/10
 
Yandex
lucire.com
lucire.net
facebook.com
twitter.com
wikipedia.org
instagram.com
wikipedia.eu
pinterest.nz
en-academic.com
wikiru.wiki
—This is the Russian version. All are relevant, and they are fairly expected, other than the ninth result which I’ve not come across this high before, although it still relates to Lucire. 10/10
 
Bing
lucire.net
wikipedia.org
lucire.com
spanishdict.com
lucire.com
facebook.com
drlucire.com
spanishdict.com
twitter.com
lucirahealth.com
—How Bing has slipped. There are sites here relating to the Spanish word lucirse and to Lucira, who makes PCR tests for COVID-19. One is for Yola. 7/10
 
Qwant.com
lucire.net
wikipedia.org
spanishdict.com
drlucire.com
spanishdict.com
tumblr.com
lucirahealth.com
lacire.co
amazon.com
lucirahealth.com
—For a Bing-licensed site, this is even worse. No surprise to see lucire.com gone here, given how inconsistently Bing has treated it of late. But there are results here for Lucira and a company called La Cire. The Amazon link is also for Lucira. 3/10
 
Qwant.fr
lucire.net
wikipedia.org
reverso.net
luciremen.com
lucire.com
twitter.com
lacire.co
lucirahealth.com
viyeshierelucre.com
lucirahealth.com
—The sites change slightly if you use the search box at qwant.fr. The Reverso page is for the Spanish word luciré. Sixth through tenth are irrelevant and do not even relate to the search term. Eleventh and twelfth are for lucire.com and facebook.com, so there were more relevant pages to come. The ranking or relevant results, then, leaves something to be desired. 5/10
 
Duck Duck Go
lucire.com
lucire.net
wikipedia.org
spanishdict.com
drlucire.com
spanishdict.com
lucirahealth.com
amazon.com
lacire.co
luciremen.com
—Well, at least the Duck puts lucire.com up top, and the home page at that (even if Bing can’t). Only four relevant results, with Lucire Men coming in at tenth. 4/10
 
Brave
lucire.com
instagram.com
twitter.com
wikipedia.org
linkedin.com
lucire.net
facebook.com
fashion.net
wiktionary.org
nsw.gov.au
—For the new entrant, not a bad start. Shame about the smaller index size. All of these relate to us except the last two, one a dictionary and the other referring to Yolande Lucire. 8/10
 

The results are surprising from these first results’ pages.
 
★★★★★★★★★★ Google
★★★★★★★★★★ Yandex
★★★★★★★★★★ Startpage
★★★★★★★★★☆ Virtual Mirage
★★★★★★★★☆☆ Brave
★★★★★★★☆☆☆ Baidu
★★★★★★★☆☆☆ Bing
★★★★★★☆☆☆☆ Swisscows
★★★★★☆☆☆☆☆ Mojeek
★★★★★☆☆☆☆☆ Qwant.fr
★★★★☆☆☆☆☆☆ Duck Duck Go
★★★☆☆☆☆☆☆☆ Qwant.com
 

It doesn’t change my mind about the suitability of Mojeek for internal searches though. It’s still the one with the largest index aside from Google, and it doesn’t track you.

Tags: , , , , , , , , , , , , , , , , , , , ,
Posted in China, France, internet, publishing, technology, UK, USA | 2 Comments »


Cellphone apps: InShot’s Music Player may finally be the one; Über remains a total waste of time

14.05.2021

Forgetful Muzio Player has been replaced by a program (or app) called Music Player, which isn’t the best brand name considering the many other apps out there with the same name. This one’s version 2.5.6.74 and its maker is InShot Inc., so if all goes well, this is the one Meizu users should go for.
   First, a good bit: it picks up the directories on the SD card, which, till Meizu upgraded its Music app, I thought I could take for granted.
   The not-so-good bits. It doesn’t pick up the album artwork, so you have to link each cover yourself. The disadvantage is that you have to search for the cover by image, and there’s no option to search by name. Mind you, it was the same story with Meizu Music, and provided you have a rough idea of when you downloaded the album (as it displays the covers in reverse chronological order), it isn’t impossible.
   It did, however, pick up the graphics from the songs where the cover image was embedded and used them for the album covers … at least it did till today, when it forgot all about those and I spent more time relinking the dozen or so that the app forgot.
   What is it about forgetful software, or at least software that operates differently every day? Do I need to invent the dot-ini file (since it doesn’t seem to exist in this universe) or radically suggest that software follows a set of instructions, line by line, that do not vary each time?
 

Above: InShot’s Music Player displayed an album cover for Gone with the Wave yesterday, but today it appears to have forgotten what it was.
 
   Nevertheless, Music Player does “share” the chosen album cover with the individual tracks, so when they’re played, the image appears on the player screen, something that Muzio was loathe to do.
   In other words, Music Player does what Meizu Music used to do before it became a lemon and, providing it doesn’t forget all the linked album covers (all 280 of them), it’ll stay on my phone for the foreseeable future. Since it didn’t come from an app store, it won’t be “upgraded” to something inferior, either, which appears to be the path of a lot of cellphone software.
   It doesn’t look too bad, though admittedly Muzio Player’s interface remains superior.
   Linking 280 covers with each album over the course of a day and a bit sure beats linking over 1,000 of them with each song on Muzio Player, and to have three weeks’ worth of labour vanish despite the program saying, ‘Changes saved’.
   If InShot’s Music Player keeps things as they are, then it’s the replacement I’ve sought for some time. Since I didn’t hear back from Muzio Player, I’ve deleted the app.
 
One program I can say is a genuine waste of time is Über, if you happen to use a Meizu M6 Note like me. I’ve always resisted it, on principle. If they didn’t play silly buggers on tax, I might be more inclined to have supported them, but I’ve remained very faithful to public transport and taxis all these years.
   Because of timing and circumstances that I won’t go into here, and having had a virus all of last week that I haven’t fully shaken off (one symptom being short of breath), Über was suggested again today. My first choice was driving to the station, catching the train (being careful not to spread any of my germs about), then either a bus or cab, to pick up a press car from town. That would mean after returning home, I would have to walk to the station while not feeling 100 per cent to get my own car. I know first-hand that a cab from here in the northern suburbs can be pricey—and that’s when one even shows up, as my partner’s faced ridiculously long waits for them during the daytime. So Über was a realistic choice and I’d be suckered into helping to concentrate wealth in the hands of the few milliardaires high up at these tech firms at the expense of working people.
   Never fear, for Über is a half-baked app that cost me two missed trains and I could have been typing this an hour earlier than I am now.
   Thanks to the full factory reset that PB did last year on my phone, and my installation of Meizu’s far more advanced Chinese OS afterwards, I was able to create an account this time and log in. It didn’t keep returning the message that I had attempted too many log-ins, even after a single attempt.
   After that, it takes about half an hour to read the terms and conditions and the privacy policy on a cellphone. You can opt out of promo messages, or so they claim (to be on the safe side, I’ve done it thrice: once when reading the T&Cs before I accepted them, once after I read them, and once more from the desktop when an email with an unsubscribe link arrived).
   And that’s really about all it does. You can’t type in any destination; I later checked their instructions on a proper computer and I was doing exactly what was asked. I could feed in my home address (it came up after I began feeding in the basics), and I could feed in some favourites, but I can’t actually go to them.
   Naturally, it will take your credit card details: Über made sure that that part worked.
   Having saved the Railway Station as a destination, and attempted to order a ride to there, I got to a screen to tell me that Über isn’t available in my area. Whether that means Tawa, or Wellington, or New Zealand, I don’t know.
 



Above: It’s impossible to feed in a destination in Über, but it’s probably because it’s not available in Tawa.
 
   I have map software on my phone—both Here Maps and Baidu Maps. And my partner does successfully use Über from time to time, on a Huawei phone which, like my Meizu, is Google-free. She has no Google Maps, so I know that isn’t a prerequisite for Über. I also know Google Services aren’t, either. At least these are points in their favour. I can’t be bothered troubleshooting beyond that, since they’ll just deny everything and pass the buck.
   Eventually, when I realized Über is a monumental waste of time, I carried out plan A, and took a train an hour after the one I could have taken had I not attempted to get an Übercab. And walked in the wintry air to collect my car.
   It was an easy decision to delete my account and the app soon after. Just as well, really. Big Tech loses once again. To think, the little music player made by a small company is more reliable than the milliards behind Über.
 

Above: Relieved to be on a desktop computer—and hopefully I won’t need to have any connection with Über ever again.

Tags: , , , , , , , , , ,
Posted in design, technology | 1 Comment »


Civility is a good thing

23.12.2010

Baidu Talk, which launched in September, has netted 1 million users already, according to PC World. Michael Kan reports that thanks to the service’s insistence that no aliases are used (registered users’ identities are verified with the People’s Republic’s government) ‘this has led to more “civil” discussions between users on Baidu Talk.’
   It shows it can be possible. In the past I’ve lamented the decline of each medium as it’s spoiled by spam or splogs. YouTube has been ruined by extremist commenters. In most cases, these people hide behind the veil of anonymity.
   The city blog I proposed during my campaign would have required registration as well. The logistics were another matter but Baidu shows it can be done—and a more civilized discussion is just what we need to make some real progress in society. If dialogue and engagement solve problems, then the medium for both must be where someone wants to go—and not see a whole bunch of swearing going on.
   As I wrote some years back, what I miss about the internet, and this may be rose-coloured glasses, was the collegial feeling that was there in the early days. In the 1990s, we naïvely put our details into online email directories before we figured out that spammers could harvest them. But, importantly, we got quite a few things done. Some of my closest allies in business can be traced back to those early days, before we had to cut through more clutter to find good, trustworthy people.
   Providing a safe forum where the veil of anonymity is gone—where John Gabriel’s Greater Internet F***wad Theory does not apply—is perhaps one of the best things that can be done for so many services. A Small World is one where there’s some degree of safety and security; LinkedIn, by its nature, continues to feel collegial. Since we aren’t talking about sensitive information here, where aliases and anonymity might be key, an online John Hancock can be a good thing.
   The bigger picture is that if China is encouraging this sort of dialogue, I will have to say: watch out. And I did say four years ago that, with Google’s willingness to engage in self-censorship when it entered the Middle Kingdom in 2006, the Chinese people would only be more loyal to Baidu et al in the long term. That influence might yet grow beyond China’s borders.

Speaking of the decline of society, a few weeks ago, Dad and I had to go to the ANZ in Kilbirnie to re-sign some authorities we had on each other’s accounts. (We had to do this with American Express as well: what was it with these big institutions losing the original authorities that we did years ago, all in the same week?) Outside the sliding doors, I heard a very loud female voice. My initial thought was, ‘This is a very loud promotion someone is having on Bay Road.’
   When the doors slid open again, I heard a whole bunch of profanities. ‘You f***ing bitch, you whore …’—you get the idea. I got up, passed an elderly lady on her way in (this was Tuesday, 3.30 p.m., when a lot of elderly are walking along Bay Road), and said, ‘You don’t need to hear that sort of language, do you, dear?’ She said, ‘No.’
   A crowd, mostly of schoolchildren had gathered round to watch these two young women at it outside the local Pricebusters. Or, should I say, there was one abuser and one standing there and taking it. Seeing as neither was armed (I may be stupid, but not that stupid), I stood between them and asked them to stop: that the OAPs walking along minding their own business don’t have to listen to their sort of language.
   ‘I don’t care. You don’t know this f***ing whore …’
   ‘I don’t know you, either. I’m asking you to stop.’
   Although this had gone on for some time, it was only then that someone from the Pricebusters store came out. I asked, ‘Would you like to do anything? It’s your shop, but there hasn’t been an assault.’
   Seeing as the abuse continued, I said, calmly: ‘Walk away. Turn around, walk in opposite directions, and walk away.’
   I have a feeling that ‘Walk away’ in these ladies’ mother tongue meant ‘Let’s start beating the crap out of each other and this dude in the middle can get caught in the crossfire.’
   Fists flew, hair was pulled, and I got a little scrape where my watch was and my glasses were knocked off. It was then that various adults—I assume the female staff of the Pricebusters store—restrained the two. I advised the store that they could call the police now. Dad had come out by then and I suggested we finish the transaction inside the bank. And he didn’t need to see his son lose a fight to two women.
   These Streets of San Kilbirnie are tough and even Karl Malden would be surprised.
   Maybe I was the only adult around over several minutes, but I’m surprised that no one else helped out. It reminds me of two other incidents in the last few years where I played “first responder” (with a much larger friend assisting!) to a homeless man getting bullied and to a teen who had fallen off her bike.
   This isn’t about being intolerant of bad language. Most of this junk is on telly now after a certain hour. It’s the idea, which we’ve chatted about at the Vista Group luncheons with Jim and Natalie, that once we tolerate one thing, a worse thing will emerge. Usually this comes up when we discuss public drunkenness, and how, over the last generation, less and less acceptable behaviour becomes the norm.
   The fear that getting involved would drag one into a court case as a witness—that is baseless, too. When the police came (and quite quickly, too), I had finished at the bank. I asked one constable if he needed me to be a witness, and he said that he already had a statement from someone else. So: I tried to do a good deed, and I didn’t get dragged into a prolonged assault case. It’s easier than we think.
   And maybe I did something for the little guy, to draw the line at something that shouldn’t be acceptable in what is usually a very pleasant neighbourhood.

Tags: , , , , , , , , , , , , , , , ,
Posted in business, China, culture, internet, New Zealand, technology, Wellington | 1 Comment »