Published on [Permalink]
Reading time: 4 minutes

This deal is getting worse all the time.

đź”— The Evolution of Privacy and Ownership in the Blogging World | Nerve Endings Firing Away:

People are now vehemently against AI companies hoovering up their writings in their large language models because…well, I’m not entirely sure. SEO was a thriving business, and we peppered our blogs with keywords, Google Adsense, Amazon affiliate links, etc., but now we don’t want most to read them. What changed?

What has changed is the scale of the thing.

Back in the Napster days, you could suddenly download a copy of pretty much any song or album you wanted for free in a few minutes. The record industry of course hated this idea and went on the warpath against their customers, insisting that the technology constituted both an existential threat to their business and a mass violation of copyright.

Meanwhile, the “Information wants to be free!!” crowd said that this was no different from the early days of audio and video home taping. The music industry made exactly the same apocalyptic claims about that technology that it was making about mp3 filesharing, and those claims were overwrought and ridiculous.

If I had to pick between the two positions, I am obviously going to fall in with the latter. A multi-billion dollar industry is just not going to get much sympathy from me. That said, I don’t think it is that binary. Regardless of whether or not you like the business practices of the record companies, downloading all your music for free cuts out any possibility for the artists who actually create the music to be compensated for their work.

More relevant to the question at hand, though, insisting that mp3 file sharing was no different than home taping is a major category error. With taping, the record industry made claims that it was going to wipe out their business. It obviously did not, whereas digital downloading kind of did. The difference was the scale that was possible.

What was dumb about the record companies' howling about home taping was that there was no way individual people making copies of albums for themselves could do it at a scale that could have any significant impact on sales. An actual pirate operation would need to have large banks of tape decks set up to produced many copies at once. They would need to buy masses quantities of blank tapes and then invest in distribution infrastructure. That was just not the sort of thing that was going to happen with someone buying a tape deck at Radio Shack

Digital file sharing, on the other hand, eliminate basically all of those barriers to entry and reduced the operating cost for a piracy business almost to zero. Again, the record industry was run by douchebags who had gotten fat and lazy exploiting artists and customers for decades, but they weren’t wrong about the threat posed by file sharing.

Which brings me back to Pratik’s question about AI scraping. There is a similar difference of scale here. It seems entirely reasonable to me that someone with a website would be fine with having their stuff indexed by search engines and linked to by other sites (even if that is automate) but completely not okay with having their stuff ingested by large language models to be infinitely regurgitated as generated content slurry. The latter happens at much greater scale than the former and by its nature eliminates any possibility of maintaining any sort of control over your own stuff.

Furthermore, I don’t think it is fair to tell folks “You put your stuff online and made it public, so that means you should be fine with anyone taking it and doing whatever they want with it.” To be clear, I don’t think that Pratik is making this claim himself but rather just referencing it as he thinks through the issue. That said, that does seem to be the general approach of the LLM evangelists and it needs to be addressed.

My guess is that most of the people who have been putting their stuff online for the last few decades have been doing so under a well-understood set of rules, i.e., you put your stuff out there, people link to it, it gets indexed and is searchable/findable. Maybe it gets scraped and misused, but that should be the exception, not the rule. Now those rules—informal though they may have been—have been changed and we are all expected to just be fine with it.

I think it is entirely understandable that a bunch of people are not cool with that, feel like they’ve gotten pulled into something they never agreed to, and can’t see any way to get out of it.

✍️ Reply by email

✴️ Also on Micro.blog

omg.social greenfield.social another weblog yet another weblog