Meta Says It’s Okay to Feed Copyrighted Books Into Its AI Model Because They Have No “Economic Value”


0

Unlock the Secrets of Ethical Hacking!

Ready to dive into the world of offensive security? This course gives you the Black Hat hacker’s perspective, teaching you attack techniques to defend against malicious activity. Learn to hack Android and Windows systems, create undetectable malware and ransomware, and even master spoofing techniques. Start your first hack in just one hour!

Enroll now and gain industry-standard knowledge: Enroll Now!

Meta has been accused of illegally using copyrighted material to train its AI models — and the tech giant’s defense is pretty thin.

In the ongoing suit Richard Kadrey et al v. Meta Platforms, led by a group of authors including Pulitzer Prize winner Andrew Sean Greer and National Book Award winner Ta-Nehisi Coates, the Mark Zuckerberg-led company has argued that its alleged scraping over seven million books from the pirated library LibGen constituted “fair use” of the material, and was therefore not illegal.

The specious defenses don’t end there. As Vanity Fair spotlights in a new writeup, Meta’s attorneys are also arguing that the countless books that the company used to train its multibillion-dollar language models and springboard itself into the headspinningly buzzy AI race are actually worthless.

Meta cited an expert witness who downplayed the books’ individual importance, averring that a single book adjusted its LLM’s performance “by less than 0.06 percent on industry standard benchmarks, a meaningless change no different from noise.” 

Thus there’s no market in paying authors to use their copyrighted works, Meta says, because “for there to be a market, there must be something of value to exchange,” as quoted by Vanity Fair — “but none of [the authors’] works has economic value, individually, as training data.” Other communications showed that Meta employees stripped the copyright pages from the downloaded books.

This is emblematic of the chicaneries and two-faced logic that Meta, and the AI industry at large, deploys when it’s pressed about all the human-created content it devours. 

Somehow, that stuff is simultaneously not that valuable, and we should all stop pearl-clutching about the sanctity of art, and anyway an AI writes creative prose just as well as a human now — but is also absolutely essential to building our new synthetic gods that will solve climate change, so please don’t make us pay for using any of it. That last bit is literally what OpenAI argued to the British Parliament last year — that there isn’t enough stuff in the public domain to beef up its AI models, so it must be allowed to plumb the bounties of contemporary copyrighted works without paying a penny.

Seemingly, this is an unspoken understanding at the top AI companies. When one Meta researcher inquired if the company’s legal team had okayed using LibGen, another responded: “I didn’t ask questions but this is what OpenAI does with GPT3, what Google does with PALM, and what Deepmind does with Chinchilla so we will do it to[o],” per Vanity Fair, from internal messages cited in the suit.

Tellingly, the unofficial policy seems to be to not speak about it at all.

“In no case would we disclose publicly that we had trained on LibGen, however there is practical risk external parties could deduce our use of this dataset,” an internal Meta slide deck read. The deck noted that “if there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.”

More on AI copyright: OpenAI Says It’s “Over” If It Can’t Steal All Your Copyrighted Work



Unlock the Secrets of Ethical Hacking!

Ready to dive into the world of offensive security? This course gives you the Black Hat hacker’s perspective, teaching you attack techniques to defend against malicious activity. Learn to hack Android and Windows systems, create undetectable malware and ransomware, and even master spoofing techniques. Start your first hack in just one hour!

Enroll now and gain industry-standard knowledge: Enroll Now!

Don’t miss the Buzz!

We don’t spam! Read our privacy policy for more info.

🤞 Don’t miss the Buzz!

We don’t spam! Read more in our privacy policy


Like it? Share with your friends!

0

0 Comments

Your email address will not be published. Required fields are marked *