You can’t contract yourself out of breaking the law, Google

Google has updated its privacy policy, giving itself carte blanche to take publicly available data to use for its large language models and “AI”.

I don’t think whomever wrote the update has any comprehension of the law. Or that they do, but think they can get away with it. Maybe in their own country they can, with all their lobbyists and weak politicians, but not everyone is a walkover.

Think about it: if you could write a privacy policy to cover law-breaking, then a drug dealer could say that by purchasing from them, the Crimes Act does not apply. I can think of even more heinous examples.

Our own T&Cs have changed with an extra sentence added:

• You agree not to take any intellectual property from this or affiliated sites without permission, for any purpose, including saving to your hard drive for personal use, modification, deconstruction, or republishing on another site. You may have a defence under fair dealing when intellectual property is taken for the purposes of news reporting, education and other situations under the Copyright Act 1994. For absolute clarity, fair dealing does not extend to scraping data for training large language models or “AI”, which we prohibit.

The difference is that when we do it, it’s in line with existing laws. You can’t grab people’s property and do as you please. Indigenous people who were subject to colonialism have first-hand experience of this, and over time things have finally shifted for most people to regard this theft as very wrong. Google is playing the colonizers’ game, just online.

Of course you can be inspired by something and that inspiration leads you to create another thing that’s new and original.

But there is a limit and as I read more about how the technology has been implemented, the more it seems Big Tech has crossed the line between inspiration and infringement.

One lawsuit, since joined by Sarah Silverman, against Open AI is backed up with evidence of piracy. In addition, Open AI dobs itself in:

The suit shows that ChatGPT will summarise those authors’ books when prompted, infringing copyright and not giving any of the copyright information about the books, the lawyers claim. The authors “did not consent to the use of their copyrighted books as training material”, the lawsuit says.

In April, The Washington Post revealed that Google had indeed taken ‘tokens’ from people’s websites for its C4 dataset, used by Google and Facebook. Back then I was unsure where the duplicate is, but it appears the game has advanced since then, and the duplicates are appearing.

The plaintiffs also claim that Meta used an illegal ‘shadow library’, drawing copyrighted content via torrent sites, then republishing them.

Remember, when regular people use torrents to get IP, they can be punished or fined. In an extreme case, if they suspect you hosted copyrighted material, they could have the cops pretend to be Steve Forrest and the boys from SWAT if you’re Kim Dotcom and raid your house. But when they do it and take from you, they do so openly and dare you to take them on.

What’s the bet that the authorities will actually find evidence of infringement, money laundering and racketeering if they turned their attentions to Big Tech? And how big, in this present lawsuit, will this class get?

You may also like