Share this page
Quick links
Add feed
|
|
The Persuader
My personal blog, started in 2006. No paid or guest posts, no link sales.
Posts tagged ‘Yandex’
16.01.2023
Time to do some analysis on the age of the search results for this site through the search engines. Iβm curious about the drop in hits. βContentsβ pagesβ also include static pages and, in Bingβs case, PDFs. (PS.: For clarification, a contents’ page would include a Wordpress tag page, or a page for a set month containing all that month’s posts.)
Mojeek
Contentsβ pages: β
β
β
β
β
β
β
β
β
2002
2003
2004
2005
2006 β
β
2007 β
2008 β
β
2009 β
β
β
β
β
β
2010 β
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020 β
β
2021 β
β
β
β
β
β
β
β
β
β
β
2022 β
β
β
β
β
β
β
β
β
β
β
β
β
2023
Interesting spread, and no problems indexing PHP pages (after 2010). Some repeat results, with Mojeek having both www.jackyan.com and jackyan.com versions of the same pages. I’m surprised at the gap between 2010 and 2020, though they do appear after the 50 mark.
Google
Contentsβ pages β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
Now that was a surprise. Only the static, HTML pages, with a lot of ex-Blogger indices (which were also HTML). Talk about being a Wayback Machine. No individual blog posts at all and a lot of really old stuff that isn’t even linked any more. I expected Yandex to do something like this, not Google.
Bing
Contentsβ pages β
β
β
β
β
β
β
β
β
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023 β
Still bizarre. Bing claimed it had six results and delivered 10 on the first page. One blog post from 2023 makes it in hereβitβs one attacking Bing and calling it near death. (Of the ones after the 3rd, it’s done marginally better, though it’s still hundreds off the norm.) During the course of the day, the 50-something results Bing had for site:jackyan.com has fallen to 10. Talk about decaying.
Interestingly, Bing gives 50 or so results on mobileβsomething I discovered this morning after compiling the above and before I pressed βPublishβ in Wordpress.
Yandex
Contentsβ pages β
β
β
β
β
β
β
2002
2003
2004
2005
2006 β
β
β
β
β
β
β
β
β
β
β
β
β
2007 β
β
β
β
β
β
β
β
β
2008 β
β
β
2009 β
β
β
β
β
β
2010 β
β
β
β
2011 β
β
2012
2013
2014
2015
2016
2017
2018
2019 β
β
2020 β
2021
2022
2023
Some repeated results and definitely in favour of static HTML pages (pre-2010) over dynamic ones.
Baidu
Contentsβ pages β
β
β
β
β
β
β
β
2002
2003
2004
2005
2006
2007
2008
2009
2010 β
2011 β
2012
2013
2014 β
2015
2016
2017 β
β
β
β
2018 β
β
2019 β
2020 β
β
β
β
β
β
β
β
β
2021 β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
2022 β
β
β
β
β
β
2023
Baidu gives the wrong date for a lot of results, and there was a repeated result, too. But a pretty good site search and far closer to what I expected I would see, since it’s the post-2010 blog posts that I thought were more significant. There were a few in 2006 that got me some international mainstream media coverage and appearances on Aljazeera English’s Listening Post in those early days, but the most read blog entries were from 2016.
Yep
Contentsβ pages β
β
β
β
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014 β
β
2015
2016
2017 β
2018
2019
2020 β
β
2021
2022 β
2023
Not bad for a newbie in beta, spidering both static and dynamic (PHP) pages. Better than Bing’s mix for the 10 each delivers.
Gigablast delivers none.
I canβt say for sure what caused the traffic drop based on the above, since I havenβt documented one of these searches before. So Iβve nothing to compare it to, though my vague memory is that Google would have had some of my actual posts among the top 50. A lot of the pages it does have there arenβt that highly trafficked. Could we blame Google?
Sadly, I donβt have enough data to know for sure, but on the face of it, Googleβs top 50 are anomalous, while Bing continues to demonstrate that itβs largely useless.
PS.: Just tried site:bing.com. Bing’s results were terrible, including some real estate searches for homes in France, lots of repeated results. Mojeek and Google delivered better results for site:bing.com than Bing did.
Tags: 2023, Baidu, Bing, blogosphere, computing, Google, Microsoft, Mojeek, search engines, technology, Yandex Posted in business, China, internet, technology, UK, USA | No Comments »
21.09.2022
Bing is spidering new pages, as long as theyβre very, very old.
Last week, we added a handful of Lucire pages from 1998 and 1999. An explanation is given here. And Iβve spotted at least two of those among Bingβs results when I do a site:lucire.com search.
As a couple of newer pages have also shown up, I doubt thereβs any issue with the template; and the home page now also appears, too. But, by and large, Bing is Microsoftβs own Wayback Machine, and most of the Lucire results are from the 1990s and early 2000s.
It got me thinking: do the other search engines do this, too? For years, Google grandfathered older pages and they came up earlier. (Meanwhile, searches for my own name still have this site, and the company site, down, having lost first and second when we switched from HTTP to HTTPS in March. Contrary to expert opinion, you donβt recover, at least not quickly.)
As Lucire includes the date of the article in the URL, this should be an easy investigation. Weβll only do the first 50 results as thatβs all Bingβs capable of. Iβll try not to include any repeat results out of fairness. βContentsβ pagesβ include the home page, the Lucire TV and Lucire print shopping pages, and tag and category pages.
Bing
Contentsβ pages β
β
β
1997
1998
1999 β
β
β
β
2000 β
2001 β
β
β
β
β
β
β
β
2002 β
β
2003 β
β
β
2004 β
β
β
β
2005 β
β
2006
2007 β
β
β
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018 β
2019 β
2020
2021
2022
Google
Contentsβ pages β
β
β
β
β
β
β
β
β
β
β
β
β
1997
1998
1999
2000
2001
2002 β
β
2003
2004 β
β
2005
2006
2007 β
2008
2009
2010 β
2011 β
β
β
2012 β
2013 β
β
2014 β
β
β
2015 β
2016 β
β
2017 β
2018 β
β
β
2019 β
β
β
2020 β
β
β
β
β
β
β
2021 β
2022 β
β
β
β
Mojeek
Contentsβ pages β
β
β
β
β
β
1997
1998
1999
2000
2001
2002
2003
2004 β
2005
2006
2007
2008
2009 β
2010 β
β
2011 β
β
2012 β
β
β
2013 β
β
β
β
2014 β
β
β
2015 β
β
β
β
β
2016 β
β
β
β
β
β
β
2017 β
β
β
β
β
β
2018 β
β
β
2019 β
β
β
β
2020 β
β
β
2021
2022
Baidu
Contentsβ pages β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018 β
2019 β
2020
2021 β
β
β
2022 β
Yandex
Contentsβ pages β
β
β
β
β
1997
1998
1999 β
β
β
β
β
2000 β
β
β
β
β
β
2001 β
β
β
2002 β
β
β
2003 β
β
β
2004 β
2005
2006
2007 β
β
β
β
2008 β
β
2009 β
β
2010 β
β
β
β
2011 β
β
β
2012 β
β
2013 β
2014 β
β
2015
2016
2017
2018
2019
2020 β
β
β
2021 β
2022
To me, that was fascinating. My instincts werenβt wrong with Bing: itβs old and it favours the old (two of the restored articles were indexed). From the first 50 results, 18 results were repeatsβthatβs 36 per cent. Iβm of the mind that Bing is so shot that it can only index old pages that donβt take up much space. New ones have a lot more data to them, generally.
Google does a good job with the top-level and second-level contentsβ pages, though there were a few strange tag indices. But the distribution is what youβd expect: people would search for more recent stories. I know we had some popular stories from 2002 that still get hit a lot.
Mojeek has a similar distribution, though it should be noted that you canβt do a blanket site: search. There must be a keyword, and in this case itβs Lucire. The 2016 pages form the mode, which I donβt have a huge problem with; itβs better than the 2001 pages, which Bing has over everything else.
Baiduβs one is crazy as individual stories are seldom spat out in the first five pages, the search engine preferring tag indices, though half a dozen later story pages do make it into its top 50.
Finally, Yandex leans toward older pages, too, including our most popular 2002 piece. Itβs the 2000 stories it has the most of among the top 50, and thereβs a strange empty period between 2015 and 2019. But at least there is a fairer distribution than Bing can muster.
The other query that I had was whether these search engines were biasing their results toward HTML pages, rather than PHP ones. If thatβs the case, then it could explain Bingβs preference for the old stuff (Lucire didnβt have PHP pages till 2008; prior to that it was all laboriously hand-coded, albeit within templates.)
Bing
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
HTML
β
PHP
Google
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
HTML
β
β
β
β
β
β
β
β
β
PHP
Mojeek
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
HTML
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
PHP
Baidu
β
β
β
β
β
β
β
β
β
β
HTML
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
PHP
Yandex
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
HTML
β
β
β
β
β
β
PHP
I think we can safely say thereβs a preference for HTML over PHP. Mojeek brings up a lot of HTML pages after the top 50, even though this sample shows the split isnβt as severe.
Our PHP pages are less significant though: they contain news stories, and these are often ones other media covered, too. But I would have thought some of the more popular stories would have made the cut, and here itβs Mojeekβs distribution that looks superior to the othersβ. It seems like itβs actually analysing the page contentβs text, which is what you want a search engine to do.
Baiduβs PHP-heaviness is down to all the tag indicesβrendering it not particularly helpful as a search engine.
On these two tests, Mojeek and Google rank best, and Yandex comes in third. Baidu and Bing are a distant fourth and fifth.
Tags: 2022, Baidu, Bing, Google, information, Lucire, Mojeek, publishing, search engine, search engines, technology, Yandex Posted in China, culture, internet, media, publishing, technology, UK, USA | No Comments »
22.08.2022
After reading Mojeekβs blog post from last July, I learned there are only seven search engines in the world now. In other words, I was checking more search engines out in the 1990s. Itβs rather depressing, especially as the search market is largely a monopoly with Google dominating it (and all the ills that brings), and Bing and its licensees (like Duck Duck Go) with their 6 per cent.
Knowing there are seven, I fed the site:lucire.com search into all of them to see where each stood.
The first figure is the claimed number of results, the second the actual number shown (without repeats removed, which Bing is guilty of).
I canβt use Brave here as its site search is Bing as well.
Yandex appears to be capped at 250 and Mojeek at 1,000, but at least they arenβt arbitrary like Google and Baidu. Baidu has a lot of category and tag pages from the Wordpress section of our site to bump up the numbers.
Gigablast 0/0
Sogou 19/13
Bing 243/50
Baidu 13,700/213
Yandex 2,000/250
Google 6,280/315
Mojeek 3,654/1,000
Frankly, more of us should go to Mojeek. It can only get better with a wider user base. Unlike Bing, it hasnβt collapsed. I know most of you will keep going to Google, but I just donβt like the look of those limits (not to mention the massive privacy issues).
Mojeek is now at 5,900 million pages, which must be the largest index in the west outside of Google.
Tags: 2022, Baidu, Bing, China, Google, internet, JY&A Media, Lucire, Mojeek, monopoly, publishing, Russia, search engines, technology, UK, USA, World Wide Web, Yandex Posted in China, internet, publishing, technology, UK, USA | No Comments »
24.07.2022
One more, and I might give the subject a rest. Here I test the search engines for the term Lucire. This paints quite a different picture.
Lucire is an established site, dating from 1997, indexed by all major search engines from the start. The word did not exist online till the site began. It does exist in old Romanian. There is a (not oft-used) Spanish conjugated verb, I believe, spelt the same.
The original site is very well linked online, as you might expect after 25 years. You would normally expect, given its age and the inbound links, to see lucire.com at the top of any index.
There is a Dr Yolande Lucire in Australia whom I know, who Iβm used to seeing in the search engine results.
The scores are simply for getting relevant sites to us into the top 10, and no judgement is made about their quality or relevance.
Google
lucire.com
twitter.com
lucire.net
instagram.com
wikipedia.org
linkedin.com
facebook.com
pinterest.nz
neighbourly.co.nz
βI hate to say it, as someone who dislikes Google, but all of the top 10 results are relevant. Fair play. Then again, with the milliards it has, and with this as its original product, it should do well. 10/10
Mojeek
scopalto.com
lucirerouge.com
lucire.net
lucire.com
mujerhoy.com
portalfeminino.com
paperblog.com
dailymotion.com
eldiablovistedezara.net
hispanaglobal.com
βMojeek might be flavour of the month for me, but these results are disappointing. Scopalto retails Lucire in France, so thatβs fair enough, but disappointing to see the original lucire.com site in fourth. Fifth, sixth, seventh, ninth and tenth are irrelevant and relate to the Spanish word lucir. Youβd have to get to no. 25 to see Lucire again, for Yolaβs website. Then itβs more lucir results till no. 52, the personal website of one of our editors. 5/10
Swisscows
lucire.net
wikipedia.org
lucire.com
spanishdict.com
lucire.net
lucire.com
drlucire.com
facebook.com
spanishdict.com
viyeshierelucre.com
βConsidering it sources from Bing, it makes the same mistakes by placing the rarely linked lucire.net up top, and lucire.com in third. Fourth, ninth and tenth are irrelevant, and the last two relate to different words. Yolaβs site is seventh, which is fair enough. 6/10
Baidu
lucire.net
lucire.com
lucire.cc
lucire.com
kanguowai.com
hhlink.com
vocapp.com
forvo.com
kuwo.cn
lucirehome.com
βInteresting mixture here. Strange, too, that lucire.net comes up top. We own lucire.cc but itβs now a forwarding domain (it was once our link shortener, up to a decade ago). Seventh and ninth relate to the Romanian word strΔlucire and eighth to the Romanian word lucire. The tenth domain is an old one, succeeded a couple of years ago by lucirerouge.com. Not very current, then. 7/10
Startpage
lucire.com
lucire.com
lucire.net
instagram.com
wikipedia.org
linkedin.com
facebook.com
pinterest.nz
fashionmodeldirectory.com
twitter.com
βAll relevant, as expected, since itβs all sourced from Google. 10/10
Virtual Mirage
lucire.com
instagram.com
wikipedia.org
lucire.net
facebook.com
linkedin.com
pinterest.nz
lucirerouge.com
nih.gov
twitter.com
βI donβt know much about this search engine, since I only heard about it from Holly Jahangiri earlier today. A very good effort, with only the ninth one being irrelevant to us: itβs a paper co-written by Yola. 9/10
Yandex
lucire.com
lucire.net
facebook.com
twitter.com
wikipedia.org
instagram.com
wikipedia.eu
pinterest.nz
en-academic.com
wikiru.wiki
βThis is the Russian version. All are relevant, and they are fairly expected, other than the ninth result which Iβve not come across this high before, although it still relates to Lucire. 10/10
Bing
lucire.net
wikipedia.org
lucire.com
spanishdict.com
lucire.com
facebook.com
drlucire.com
spanishdict.com
twitter.com
lucirahealth.com
βHow Bing has slipped. There are sites here relating to the Spanish word lucirse and to Lucira, who makes PCR tests for COVID-19. One is for Yola. 7/10
Qwant.com
lucire.net
wikipedia.org
spanishdict.com
drlucire.com
spanishdict.com
tumblr.com
lucirahealth.com
lacire.co
amazon.com
lucirahealth.com
βFor a Bing-licensed site, this is even worse. No surprise to see lucire.com gone here, given how inconsistently Bing has treated it of late. But there are results here for Lucira and a company called La Cire. The Amazon link is also for Lucira. 3/10
Qwant.fr
lucire.net
wikipedia.org
reverso.net
luciremen.com
lucire.com
twitter.com
lacire.co
lucirahealth.com
viyeshierelucre.com
lucirahealth.com
βThe sites change slightly if you use the search box at qwant.fr. The Reverso page is for the Spanish word lucirΓ©. Sixth through tenth are irrelevant and do not even relate to the search term. Eleventh and twelfth are for lucire.com and facebook.com, so there were more relevant pages to come. The ranking or relevant results, then, leaves something to be desired. 5/10
Duck Duck Go
lucire.com
lucire.net
wikipedia.org
spanishdict.com
drlucire.com
spanishdict.com
lucirahealth.com
amazon.com
lacire.co
luciremen.com
βWell, at least the Duck puts lucire.com up top, and the home page at that (even if Bing canβt). Only four relevant results, with Lucire Men coming in at tenth. 4/10
Brave
lucire.com
instagram.com
twitter.com
wikipedia.org
linkedin.com
lucire.net
facebook.com
fashion.net
wiktionary.org
nsw.gov.au
βFor the new entrant, not a bad start. Shame about the smaller index size. All of these relate to us except the last two, one a dictionary and the other referring to Yolande Lucire. 8/10
The results are surprising from these first results’ pages.
β
β
β
β
β
β
β
β
β
β
Google
β
β
β
β
β
β
β
β
β
β
Yandex
β
β
β
β
β
β
β
β
β
β
Startpage
β
β
β
β
β
β
β
β
β
β Virtual Mirage
β
β
β
β
β
β
β
β
ββ Brave
β
β
β
β
β
β
β
βββ Baidu
β
β
β
β
β
β
β
βββ Bing
β
β
β
β
β
β
ββββ Swisscows
β
β
β
β
β
βββββ Mojeek
β
β
β
β
β
βββββ Qwant.fr
β
β
β
β
ββββββ Duck Duck Go
β
β
β
βββββββ Qwant.com
It doesn’t change my mind about the suitability of Mojeek for internal searches though. It’s still the one with the largest index aside from Google, and it doesn’t track you.
Tags: 2020s, 2022, Baidu, Bing, China, Duck Duck Go, France, Google, language, Lucire, Microsoft, Mojeek, Qwant, research, Russia, search engines, technology, UK, USA, World Wide Web, Yandex Posted in China, France, internet, publishing, technology, UK, USA | 2 Comments »
|