Where do we draw the line on LLM- or “AI”-generated content?

Contrary to my earlier post, I allowed the trackbacks from AI-Summary.com after its owner reached out to me. The fact he reached out does show he read the post, and there was some human agency involved. That very courteous email even offered to remove this blog from further mining.

When you know a human’s there, you form a very different opinion. Of course most splogs have a human spinning it up and letting it loose, but here was an in-depth, and, as far as I could tell, human-authored note. And I had to think: how have I dealt with trackbacks in the past?

I’ve disallowed splogs and spun sites. These are the ones that either duplicate our (as I’ll include Lucire here) content in full, often including photos, or spin that material, which leaves enough of a trace of the original (notably the order of the words and overall structure).

I have allowed sites that excerpt, say, the first paragraph and link the rest, and I have allowed sites that do more natural links (they write about us, and reference us in that writing—how linking was meant to be used from day one).

Where does this fall? Given there’s human agency then maybe it’s the latter group. The site owner discussed how his blog was his reading list so the content was curated. Even though it’s LLM-written, then surely there’s more “authorship” than those who excerpt—a group I allow?

It gets murky given that the law might not grant authorship to machines, but legally the site owner is probably the “author” of the output given there was some “sweat of the brow” involved. Ownership should not fall to the developers using LLM technology, just as outputs from a word processor aren’t owned by Alludo (WordPerfect), Microsoft (Word) or the Document Foundation (Libre Office). Nor are outputs from Adobe Illustrator or Fontlab owned by the developers of those programs. Somewhere along the line—in these limited examples, anyway—there is a human connection. We can probably borrow those tests of connecting factors from private international law and apply them here to figure out where authorship resides.

And since the summary output isn’t identical to mine, then I probably don’t have any claim to it. It’s AI-Summary’s owner’s. That’s where it gets very juicy from a legal point-of-view, since the LLMs in the west—and their much bigger counterparts in China—have been trained on copyrighted data.

As some have pointed out in social media, it’s not a two-way street: everyday people can get done for copyright infringement, even if they took something for commentary (fair use or fair dealing is not a blanket exception, it’s a defence), while these big companies can mine the web for our work and incorporate it into theirs.

I stated earlier that if the output were not recognizable, then we may have to accept that the work had been transformed sufficiently to satisfy traditional copyright law. And LLM outputs often cannot be traced back to the original author (unlike artwork which may borrow extensively from a training model, where it could be argued that insufficient transformation had taken place; the area also gets murkier when one has to consider if a new work was inspired by an old, something extensively written about already). However, this must be a two-way street: whatever is granted to corporations needs to be granted to individual citizens under this scheme. And yet individuals do not have the means to mine. “AI” is yet another area where inequities show up in force. Might this then be a situation where there is some discretion through the courts?

This will have happened to us all: we’ll have downloaded a photo that was (wrongly) uploaded to a free photo site for our own use. The copyright owner gets the photo removed but goes after innocent users of that image. Limitation and commercial law over bona fide ownership aside, should we be more tolerant of the individual? We already are when it comes to student assignments that take from copyrighted work (pre-computer, we’d paste in cut-out photographs from newspapers and magazines). That surely serves society if we permit learning, which a lot of copyright legislation allows for under fair use. In the case above, it serves society not to be bogged down in vexatious claims.

Whatever scheme is arrived at in this new era of copyright, which takes into account LLMs and computer-generated art, it needs to recognize not only traditional doctrines established in copyright law, but also the inequities that our society faces, more so than ever. Equality before the law must be maintained, but the practice is not there as corporations derive dominance over so many citizens, even nations. And here, we open up another issue, of corporate personhood—and the very related issue of granting personhood to natural phenomena like rivers and mountains as a counter to centuries of exploitation.

You may also like