With my personal site and company siteâboth once numbers one and two for a search for my nameâhaving disappeared from Bing and others since we switched to HTTPS, I decided I would relent and sign up to Bing Webmaster Tools. Surely, like Google Webmaster Tools, this would make sure that a site was spidered and weâd see some stats?
Once again, the opposite to conventional internet wisdom occurred. Both sites disappeared from Bing altogether.
I even went and shortened the titles in the meta tags, so that this site is now a boring (and a bit tossy) âJack Yanâofficial siteâ, and the business is just âJack Yan & Associates, Creating Harmonyâ.
Just as well hardly anyone uses Bing then.
Things have improved at Google after two months, with this personal site at number two, after Wikipedia (still disappointing, I must say) and the business at 15th (very disappointing, given that itâs been at that domain since 1995).
Surely my personal and work sites are what people are really looking for when they feed in my name?
The wisdom still seems to be to not adopt HTTPS if you want to retain your positions in the search engines. Do the opposite to what technologists tell you.
Meanwhile, Vivaldi seems to have overcome its bug where it shuts down the moment you click inside a form field. Version 5.3 has been quite stable so far, after a day, so Iâve relegated Opera GX to back-up again. I prefer Vivaldiâs screenshot process, and the fact it lets me choose from the correct directory (the last used) when I want to upload a file. Tiny, practical things.
Big thanks to the developers at Opera for a very robust browser, though it should be noted that both have problems accessing links at Paypal (below).
Weâll see how long I last back on Vivaldi, but good on them for listening to the community and getting rid of that serious bug.
One annoying thing about switching the majority of our sites to HTTPS is losing our positions in the search engines.
We were always told that HTTPS would lead to rises in search-engine ranking, and that being a mere HTTP would lead to Google downgrading you.
The reality, as Iâve witnessed since we completed our server migration, is the opposite.
Take a search for my name. Since the 1990s, Jack Yan & Associates will wind up being first or second, and when this website came online in the early 2000s, it tended to be first. Stands to reason: my name, followed by dot com, is most likely what a searcher is looking for. Both sites were regular HTTP.
Weâve lost first and second places. For my searches, Google puts this site at eighth, and Duck Duck Go doesnât even have it in the top 10 (itâs 15th), info box aside. The company falls on the third page in Google and a shocking fifth in Duck Duck Go.
I was told that eventually the search engines will sort things out but itâs been two months, so you wonder just how slowly they act. If at all.
The business site has plenty of inbound links, and I imagine this site has a fair share.
Iâve fixed up some internal references to http:// after advice from some friends, but that hasnât done the trick.
I find it pretty disheartening to find that, once again, in practice, the exact opposite to conventional wisdom happens. You would think this was a routine matter, and that search engines were programmed to accept such changes, understanding that, content-wise, the secure site is the same as the formerly insecure site. After decades of search engine development, it looks like, at least to this layman, that hasnât happened. You have to start afresh even when you have the most relevant site to the search.
I was chatting to another Tweeter recently about the Ford I-Max, and decided Iâd have a hunt for its brochure online. After all, this car was in production from 2007 to 2009, the World Wide Web was around, so surely it wouldnât be hard to find something on it?
I found one image, at a very low resolution. The webâs not a repository of everything: stuff gets removed, sites go down, search engines are not comprehensiveâin fact, search engines favour the new over the old, so older posts that are still currentâsuch as this post about the late George Kennedyâcanât even be found. This has been happening for over a decade, so it shouldnât surprise usâbut we should be concerned that we cannot get information based on merit or specificity, but on novelty. Not everything new is right, and if weâre only being exposed to whatâs âinâ, then weâre no better at our knowledge than our forebears. The World Wide Web, at least the way itâs indexed, is not a giant encyclopĂŠdia which brings up the best at your fingertips, but often a reflection of our bubble or what the prevailing orthodoxy is. Moreâs the pity.
I canât let this post go without one gripe about Facebook. Good news: as far as I can tell, they fixed the bug about tagging another page on your own page, so you donât have to start a new line in order to tag another party. Bad news, or maybe itâs to do with the way weâve set up our own pages: the minute you do, the nice preview image that Facebook extracted vanishes in favour of something smaller. Iâll check out our code, but back when I was debugging Facebook pages, it was pretty good at finding the dominant image on a web page. Lesson: donât tag anyone. It ruins the ĂŠsthetic on your page, and it increases everyoneâs time on the site, and that can never be healthy. Time to fight the programming of Professor Fogg and his children (with apologies to Roger McNamee).
Top: The post Facebook picks up from an IFTTT script. Above: What happens to a post that once had a proper image preview after editing, and tags added.
My friend Richard MacManus wrote a great blog post in February on the passing of Clive James, and made this poignant observation: âBecause far from preserving our culture, the Web is at best forgetting it and at worst erasing it. As it turns out, a website is much more vulnerable than an Egyptian pyramid.â
The problem: search engines are biased to show us the latest stuff, so older items are being forgotten.
There are dead domains, of courseâeach time I pop by to our linksâ pages, I find Iâm deleting more than Iâm adding. I mean, who maintains linksâ pages these days, anyway? (Ours look mega-dated.) But the items we added in the 1990s and 2000s are vanishing and other than the Internet Archive, Richard notes its Wayback Machine is âincreasingly the only method of accessing past websites that have otherwise disappeared into the ether. Many old websites are now either 404 errors, or the domains have been snapped up by spammers searching for Google juice.â
His fear is that sites like Clive Jamesâs will be forgotten rather than preserved, and he has a point. As a collective, humanity seems to desire novelty: the newest car, the newest cellphone, and the newest news. Searching for a topic tends to bring up the newest references, since the modern web operates on the basis that history is bunk.
Thatâs a real shame as it means we donât get to understand our history as well as we should. Take this pandemic, for instance: are there lessons we could learn from MERS and SARS, or even the Great Plague of London in the 1660s? But a search is more likely to reveal stuff we already know or have recently come across in the media, like a sort of comfort blanket to assure us of our smartness. Itâs not just political views and personal biases that are getting bubbled, it seems human knowledge is, too.
Even Duck Duck Go, my preferred search engine, can be guilty of this, though a search I just made of the word pandemic shows it is better in providing relevance over novelty.
Showing results founded on their novelty actually makes the web less interesting because search engines fail to make it a place of discovery. If page after page reveals the latest, and the latest is often commodified news, then there is no point going to the second or third pages to find out more. Google takes great pride in detailing the date in the description, or â2 days agoâ or â1 day agoâ. But if search engines remained focused on relevance, then we may stumble on something we didnât know, and be better educated in the process.
Therefore, itâs possibly another area that Big Tech is getting wrong: itâs not just endangering democracy, but human intelligence. The biases I accused Google News and Facebook ofâviz.their preference for corporate mediaâbuild on the dumbing-down of the masses.
I may well be wrong: maybe people donât want to get smarter: Facebook tells us that folks just want a dopamine hit from approval, and maybe confirmation of our own limited knowledge gives us the same. âLook at how smart I am!â Or how about this collection?
Any expert will tell you that the best way to keep your traffic up is to generate more and more new content, and itâs easy to understand why: like a physical library, the old stuff is getting forgotten, buried, or evenâif they canât sell or give it awayâpulped.
Again, thereâs a massive opportunity here. A hypothetical new news aggregator can outdo Google News by spidering and rewarding independent media that break news, by giving them the best placementâas Google News used to do. That encourages independent media to do their job and opens the public up to new voices and viewpoints. And now a hypothetical new search engine could outdo Google by providing relevance over novelty, or at least getting the balance of the two right.
Don’t believe everything you read on the internet, e.g.:
Anyone alive during this period will be wondering, ‘Where’s Altavista?’ Just on visitor numbers, as opposed to visits per month, they were doing 19 million daily in 1996, 80 million daily in 1997. Goodness knows how many searches we were doing per day. Yet they are nowhere to be seen on this animation till December 1997, with 7 million monthly visits. If you were anything like me, you were using Altavista countless times a dayâeven conservatively, say you went on four times daily, and you were one of the 80 million per day, you would be putting Altavista at 9,600 million visits per month, dwarfing AOL and Yahoo. By 1997â8, we weren’t using directories like Yahoo for search, we were using these newfangled search engines. Google only overtook Altavista in 2001 in searches.
And I am old-fashioned enough to think this channel should be called Data Are Beautiful, not Data Is Beautiful.
I’d love to know the sources: not only could I remember clearly Altavista’s position (and Alexa had them in number one as well), it took me no time at all to confirm my suspicions through a web search.
Above: A re-created Altavista home page from 1999.
Keen to be seen as the establishment, and that means working with the militaryâindustrial complex, Google is making software to help the Pentagon analyse drone footage, and not everyone’s happy with this development.
The World Economic Forumâs âThis is the future of the internetâ makes for interesting reading. Itâs not so much about the future, but what has happened till now, with concerns about digital content (âfake newsâ), privacy and antitrust.
Others have written a lot about search engines and social media keeping people in bubbles (or watch the video below, but especially from 5âČ14âł), but the solution isnât actually that complex. Itâs probably time for search engines to return to delivering what people request, rather than anticipate their political views and feed them a hit of dopamine. They seem to have forgotten that they exist as tools, not websites that reinforce prejudices. Duck Duck Go has worked well for me because it has remained true to this; but others can do it, too.
However, there needs to be one more thing. Instead of Facebookâs botched suggestion of having everyday people rate news sources, which I believe will actually result in more âbubblingâ, why not rank websites based on their longevity and consistency of delivering decent journalism? Yes, I realize both Fox News and MSNBC will pass this test. As will the BBC. But this weeds out splogs, content mills, and websites that steal content through RSS. It actually takes out the âfake newsâ (and I mean this in the proper sense, not the way President Trump uses it). The websites set up by fly-by-nighters to make a quick buck, or Macedonian teenagers to fool American voters, just disappear down the search-engine indices. Facebook can analyse the same data to check whether a source is credible and rank them the same way.
It could be done through an analysis of the age of the content, and whether the domain name had changed hands over the years. A website with a healthy archive going back many years would be ranked more highly; as would one where the domain had been owned by the same party for a long period.
Googleâs Pagerank used to look at incoming links, and maybe this can still be a factor, even if link-exchanging is no longer one of the basic tenets of the web.
There’s so much good work being done by independent media all over the world, and they deserve to be promoted in a truly meritorious system, which the likes of Google used to deliver. Shame they do not today.
We do know that its claim that analysing the content on the page to determine rank hasnât worked, if some of the results that pop up are any indication. Instead, we see Google News permit the most ridiculous content-mill sites and treat them as legitimate sources; in 2005 such behaviour would be unthinkable by the big G. As to Facebook, theyâll boost whomever gives them money, so ethics donât really score big there.
Both these companies must realize they have a duty to do right by the public, but they should also know that itâs in their own interests to be honest to their users. If trust increases, so can usage. They might even ward off some of the antitrust forces looming on the horizon; fairness certainly will help Googleâs future in Europe. But they seem to have forgotten they are providers of tools, perhaps reflecting their principalsâ desires to be seen as tech celebrities or power-players.
Google already has the technology to deliver a fairer web, but I sense it doesnât have the desire to. I miss the days when Google, in particular, was an enfant terrible, there to shake things up. Now it exists to boost its own properties or rub shoulders with the militaryâindustrial complex. Everyoneâs keeping an eye on Alphabetâs share price. Forget the people or ‘Don’t be evil.’
As I have said often on this blog, there lies a grand opportunity for others to fill the spaces that Google and Facebook have left. A new site can play a far more ethical game, maybe even combine what these two giants offer. If Altavista, once the worldâs biggest website, and Myspace, once the king of social networks, can be toppled, then so can these two. Yet at their peak, neither appeared to be vulnerable. Who would have thought back in 1998 that Altavista would be toast? (The few that did, and you are out there, are visionaries.)
So who is best poised out there to deliver such tools? It would seem now is the time to start, and as people realize that this way is better, be prepared to scale, scale, scale. Remember, Google once did the same thing to oust Altavista, by figuratively building a better mousetrap. Someone just needs to take that first step.
I’m not enough of a bastard to only dis Google, because they have made a pretty good move today.
Google’s new algorithm, it is claimed, will weed out content farms, one type of site that has annoyed us here regularly.
These are sites that just pinch others’ content automatically. Because search engines pick them up, people visit their pages. Those pages are filled with adsâquite often supplied by Google. The content-pinchers make money, but the people who took the time to create the piece don’t.
I wrote, not a long time ago, that Google Blog Search had become entirely useless. That’s no exaggeration: head in there, and a lot of the blogs are scraped: they are duplicates of other sites.
In fact, when Vincent Wright’s blog was deleted, and I helped him to get it back, the Googlebot was trying to delete those scraped blogs. It’s just a shame that the Google machine was so damned useless at helping legitimate people get their blogs back, and intentionally stonewalled us to get some weird kick. If it were not for the Blogger product manager’s intervention, Vincent’s blog would still be in cyber-oblivion.
So the move, in principle, is a good one. Google claims, ‘If you take the top several dozen or so most-blocked domains from the Chrome [Personal Blocklist] extension, then this algorithmic change addresses 84 percent of them, which is strong independent confirmation of the user benefits.’
Let’s just hope that Google won’t mistakenly take out legit sites again (I have to ask what the other 16 per cent consists of!), though the fact that there has been some correlation with human editing (sites chosen by users for the Blocklist) helps.
Remember, too, how Google has stated on numerous occasions that it would not bias search results? Consider this: I wanted to search for an old post of mine so I could link it from the above text. The term: Google Buzz “de-Googling”.
On Duck Duck Go, I found the post immediately:
Out of interest, on Google, it cannot be seen: only positive things are mentioned and Google Buzz itself is the first result.
I know I have done more obscure tests to show that Google’s results are getting less precise. But the above is interesting.
It backs up an earlier article I read online about how Google treats search results, and that there is actually some bias in the system now.
I don’t begrudge Google for doing this, but it needs to stop saying that it doesn’t. We all know that it was quite happy to engage in censorship when it had Google China, already making its brand less idealistic than it once was.
Having set this precedent and created this brand association, it’s easy to believe that it now does this quite selectively for a lot more countries.
You might say that my one search is not a sign of bias, merely one where Google has a less than comprehensive search index and it could not find three old blog entries that have been around for a while. And which it used to be able to find.
It’s quite a coincidence that three negative posts about Google are no longer easily found with the relevant search terms.
That’s not great news for Google, either. Duck Duck Go is looking better by the day as the Google search engine, the one service to which its brand is tied, gets less precise.
Aside from writing a branding report today (which I will share with you once all contributors have OKed it), I received some wonderful news from Gabriel Weinberg of Duck Duck Go.
Those who are used to the Duck will know that you can search using what he calls bangsâthe exclamation mark. On Chrome, which has a minimalist design, I have set my default search engine to Duck Duck Go. But what if I wanted to search on Google?
I can either do what is built in, and what Google suggests, by beginning my typing in the search box with the word Google. Or, I can simply add !g to the end of the query, which, I might add, is something you can do from Duck Duck Go itself, too.
Of course, Google would prefer that I put all searches through them, but having Duck Duck Go as a default isn’t a bad idea. There’s a huge list of bangs that you can use at the Duck Duck Go website, which include !amazon (which will take you directly to an Amazon.com search), !gn (for Google Newsâthis one is a godsend, especially for Chrome, which has made it much harder to search the news section, even if programmed into the search settings), !video (for YouTube), and !eb or !ebay (for Ebay).
I’m glad to announce that Gabriel has taken on board a few of my suggestions for motoring and fashion publications, such as !autocar, !vogueuk, !jalopnik, !randt (Road and Track) and !lucire (had to get that one in).
We’ll announce this on Lucire shortly, but readers already saw me announce, on Thursday, a nipâtuck of our Newsstand pages. That’s not really news, so I chose to complement it with a few other announcements.
Since we were already fidgeting with that part of the site, Gabriel’s announcement prompted me to do some changes to the search pages, which were woefully out of date. The community home page was last designed four years ago, and that time, we just shifted the content over. Never mind that that content was also out of date, and included some letters to the editor that are no longer relevant. It had a link to the old forum, which only results in a PHP error. So today, I had all the old stuff stripped out, leaving us with a fairly minimalistic page.
I didn’t plan on making a blog post out of it, so I never took screen shots of the process. But at left is one of the old page from Snap Shots, and long-time readers will recognize this as the website design we had many years ago. When we facelifted other parts of the site, this was left as is: itâs old-fashioned dynamic HTML and hasn’t been moved over to a content-management system.
The new one may be a bit sterile (below), but it takes out all the extra bits that very few used: the Swicki, the Flickr gadget (we haven’t added anything to it since 2008), and a complex sign-up form for the Lucire Updates’ service. It’s been stripped to basics, but it now includes the obligatory links to Twitter, Facebook, Vkontakte and the RSS feed.
Which brings us on to the search pages. These also have the updated look, but, importantly, I fixed a bug in the Perl script that kept showing the wrong month. Regardless of whether it was March or December, the script would show January:
This is probably nothing to the actual computer hackers out there, but for a guy who has used an ATM only three times in his entire life (mainly because I lacked the local currency), this is a momentous occasion.
It was one of those evening-tweaking cases where it was simpler for me to do it myself than to ask one of the programmers, and I managed to remedy something that had plagued our website for 10 years.
The searches revealed a few strange links from long-expired pages, and here is where we might get in to a bit of discussion about online publishing.
Once upon a time, it was considered bad form to have dead links, because people might point to them. Even more importantly, because a search engine might, and you could get penalized for having too many.
This is why we’ve kept some really odd filenames. The reason the Lucire Community page is at lucire.com/email is that the link to a free email service we provided at the turn of the century was linked from there. Similarly, we still kept pages called content.htm, contents.shtml and editorial.shtml, even though these pages had not been updated for half a decade.
There are now redirects from these pages, which were once also bad form as far as the search engines were concerned.
But, given that search engines update so much more quickly in 2010, do we still need to bother about these outdated links? Will we still be penalized for having them? Should we not just simply delete them?
If you look to the right of this blog at the RSS feed links, youâll see some dead onesâthere were more, but I have been doing online weeding here, too. It almost seems to be a given that people can remove things without warning and if you encounter a dead link, well, you know how to use a search engine.
To me, it still seems a little on the side of bad manners to do that to your readers, but one might theorize that few care about that any more as many sites revamp on to CMSs to make life easier for themselves.
A side note: earlier this week, when weeding through dead links at Lucire, we noticed that many people had moved to CMSs, with the result that their sites began to look the same. Some put in excellent customizations, but many didn’t. And what is it with all the big type on the news sites and blogs these days? Is this due to the higher resolutions of modern monitors, or do they represent a change in reading habits?
PS.: The search script bug was fixed by changing $month[$mon] to simply $mon. Told you it was nothing, though I noticed that the site that we got the script from still has the bug.âJY