The New York Times article content itself could not be assessed due to paywall and JavaScript barriers. Structurally, the platform demonstrates mixed signals: it hosts journalism potentially relevant to human rights issues (technology/copyright), but restricts universal information access through subscription requirements, technical gatekeeping, and tracking systems. The architecture undermines UDHR Articles on information access, equality, and privacy.
To be very clear on this point - this is not related to model training.
It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.
Buying used copies of books, scanning them, and training on it is fine.
This is sad for open source AI, piracy for the purpose of model training should also be fair use because otherwise only the big companies who can afford to pay off publishers like Anthropic will be able to do so. There is no way to buy billions of books just for model training, it simply can't happen.
Maybe I would think differently if I was a book author but I can't help but think that this is ugly but actually quite good for humanity in some perverse sense. I will never, ever, read 99.9% of these books presumably but I will use claude.
I wonder who will be the first country to make an exception to copyright law for model training libraries to attract tax revenue like Ireland did for tech companies in the EU. Japan is part of the way there, but you couldn't do a common crawl type thing. You could even make it a library of congress type of setup.
1. A Settlement Fund of at least $1.5 Billion: Anthropic has agreed to pay a minimum of $1.5 billion into a non-reversionary fund for the class members. With an estimated 500,000 copyrighted works in the class, this would amount to an approximate gross payment of $3,000 per work. If the final list of works exceeds 500,000, Anthropic will add $3,000 for each additional work.
2. Destruction of Datasets: Anthropic has committed to destroying the datasets it acquired from LibGen and PiLiMi, subject to any legal preservation requirements.
3. Limited Release of Claims: The settlement releases Anthropic only from past claims of infringement related to the works on the official "Works List" up to August 25, 2025. It does not cover any potential future infringements or any claims, past or future, related to infringing outputs generated by Anthropic's AI models.
I can't help but feel like this is a huge win for Chinese AI. Western companies are going to be limited in the amount of data they can collect and train on, and Chinese (or any foreign AI) is going to have access to much more and much better data.
How do legal penalties and settlements work internationally? Are entities in other countries somehow barred from filing similar suits with more penalties?
Everything talks about settlement to the 'authors'; is that meant to be shorthand for copyright holders? Because there are a lot of academic works in that library where the publisher holds exclusive copyright and the author holds nothing.
By extension, if the big publishers are getting $3000 per article, that could be a fairly significant windfall.
I think that one under-discussed effect for settlements like this is the additional tax on experimentation. The largest players can absorb a $1.5B hit or negotiate licensing at scale. Smaller labs and startups, which often drive breakthroughs, may not survive the compliance burden.
That could push the industry toward consolidation—fewer independent experiments, more centralized R&D inside big tech. I feel that, this might slow the pace of unexpected innovations and increase dependence on incumbents.
This def. raises the question: how do we balance fair compensation for creators with keeping the door open for innovation?
After their recent change in tune to retain data for longer and to train on our data, I deleted my account.
Try to do that. There is no easy way to delete your account. You need to reach out to their support via email. Incredibly obnoxious dark pattern. I hate OpenAI, but everything with Anthropic also smells fishy.
We need more and better players. I hope that XAi will give them all some good competition, but I have my doubts.
After the book publishers burned Google Book's Library of Alexandria, they are now making it impossible to train a LLM unless you engage in the medieval process of manually buying paper-copies of work just to scan & destroy them...
Is this legal: scan billions of pirated books, train a LLM on them and generate billion public domain books with it so that nobody ever needs copyrighted books anymore?
Also if there is a software library with annoying Stallman-style license, can one use LLM to generate a compatible library but in a public domain or with commercial license? So that nobody needs to respect software licenses anymore? Can we also generate a free Photoshop, Linux kernel and Windows this way?
Maybe, though this lawsuit is different in respect to the piracy issue. Anthropic is paying the settlement because they pirated the books, not because training on copyrighted books isn’t fair use which isn’t necessarily true with the other cases.
Anthropic certainly seems to be hoping that their competitors will have to face some consequences too:
>During a deposition, a founder of Anthropic, Ben Mann, testified that he also downloaded the Library Genesis data set when he was working for OpenAI in 2019 and assumed this was “fair use” of the material.
Per the NYT article, Anthropic started buying physical books in bulk and scanning them for their training data, and they assert that no pirated materials were ever used in public models. I wonder if OpenAI can say the same.
You're joking, but that's actually a good pitch. There was a significant legal issue hanging over their heads, with some risk of a potentially business-ending judgment down the line. This makes it go away, which makes the company a safer, more valuable investment. Both in absolute terms and compared to peers who didn't settle.
> Although the payment is enormous, it is small compared with the amount of money that Anthropic has raised in recent years. This month, the start-up announced that it had agreed to a deal that brings an additional $13 billion into Anthropic’s coffers. The start-up has raised a total of more than $27 billion since its founding in 2021.
I’m sure this’ll be misreported and wilfully misinterpreted because of the current fractious state of the AI discourse, but given the lawsuit was to do with piracy, not the copyright-compliance of LLMs, and in any case, given they settled out of court, thus presumably admit no wrongdoing, conveniently no legal precedent is established either way.
I would not be surprised if investors made their last round of funding contingent on settling this matter out of court precisely to ensure no precedents are set.
Don't forget: NO LEGAL PRECEDENT! which means, anybody suing has to start all over. You only settle in this scenario/point if you think you'll lose.
Edit: I'll get ratio'd for this- but its the exact same thing google did in it's lawsuit with Epic. They delayed while the public and courts focused in apple (oohh, EVIL apple)- apple lost, and google settled at a disadvantage before they had a legal judgment that couldn't be challenged latter.
As long as you're not distributing, it's legal in Switzerland to download copyrighted material. (Switzerland was on the naughty US/MPAA list for a while, might still be)
Thank you. I assumed it would be quicker to find the link to the case PDF here, but your summary is appreciated!
Indeed, it is not only payout, but the destruction of the datasets. Although the article does quote:
> “Anthropic says it did not even use these pirated works,” he said. “If some other generative A.I. company took data from pirated source and used it to train on and commercialized it, the potential liability is enormous. It will shake the industry — no doubt in my mind.”
Even if true, I wonder how many cases we will see in the near future.
That metaphor doesn't really work. It's a settlement, not a punishment, and this is payment, not a fine. Legally it's more like "The store wasn't open, so I took the items from the lot and paid them later".
It's not the way we expect people to do business under normal circumstances, but in new markets with new products? I guess I don't see much actually wrong with this. Authors still get paid a price they were willing to accept, and Anthropic didn't need to wait years to come to an agreement (again, publishers weren't actually selling what AI companies needed to buy!) before training their LLMs.
Fair use isn't about how you access the material, its about what you can do with it after you legally access it. If you don't legally access it, the question of fair use is moot.
I'm sure one can try, but copyright has all kinds of oddities and carve-outs that make this complicated. IANAL, but I'm fairly certain that, for example, if you tried putting in your content license "Free for all uses public and private, except academia, screw that ivory tower..." that's a sentiment you can express but universities are under no obligation legally to respect your wish to not have your work included in a course presentation on "wild things people put in licenses." Similarly, since the court has found that training an LLM on works is transformative, a license that says "You may use this for other things but not to train an LLM" couldn't be any more enforceable than a musician saying "You may listen to my work as a whole unit but God help you if I find out you sampled it into any of that awful 'rap music' I keep hearing about..."
The purpose of the copyright protections is to promote "sciences and useful arts," and the public utility of allowing academia to investigate all works(1) exceeds the benefits of letting authors declare their works unponderable to the academic community.
(1) And yet, textbooks are copyrighted and the copyright is honored; I'm not sure why the academic fair-use exception doesn't allow scholars to just copy around textbooks without paying their authors.
Everyone has more than a right to freely have read everything is stored in a library.
(Edit: in fact initially I wrote 'is supposed to' in place of 'has more than a right to' - meaning that "knowledge is there, we made it available: you are supposed to access it, with the fullest encouragement").
Maybe some kind of captcha like system could be devised that could be considered a security measure under the DMCA and not allowed to be circumvented. Make the same content available under a licence fee through an API.
Yes to the first part. Put your site behind a login wall that requires users to sign a contract to that effect before serving them the content... get a lawyer to write that contract. Don't rely on copyright.
I'm not sure to what extent you can specify damages like these in a contract, ask the lawyer who is writing it.
Isn't that how the whole system operates? Everyone is a conduit to allow rich people to enrich themselves further. The amount and quality of opportunities any individual receives are proportional to how well it serves existing capital.
So long as there is an excuse to justify money flows, that's fine, big capital doesn't really care about the excuse; so long as the excuse is just persuasive enough to satisfy the regulators and the judges.
Money flows happen independently, then later, people try to come up with good narratives. This is exactly what happened in this case. They paid the authors a lot of money as a settlement and agreed on a narrative which works for both sets of people; that training was fine, it's the pirating which was a problem...
It's likely why they settled; they preferred to pay a lot of money and agree on some false narrative which works for both groups rather than setting a precedent that AI training on copyrighted material is illegal; that would be the biggest loss for them.
> Buying used copies of books, scanning them, and training on it is fine.
But nobody was ever going to that, not when there are billions in VC dollars at stake for whoever moves fastest. Everybody will simply risk the fine, which tends to not be anywhere close to enough to have a deterrent effect in the future.
That is like saying Uber would have not had any problems if they just entered into a licensing contract with taxi medallion holders. It was faster to just put unlicensed taxis on the streets and use investor money to pay fines and lobby for favorable legislation. In the same way, it was faster for Anthropic to load up their models with un-DRM'd PDFs and ePUBs from wherever instead of licensing them publisher by publisher.
To be even more clear - this is a settlement, it does not establish precedent, nor admit wrongdoing. This does not establish that training is fair use, nor that scanning books is fine. That's somebody else's battle.
The West can end the endless pain and legal hurdles to innovation by limiting the copyright. They can do it if there is will to open up the gates of information to everyone. The duration of 70 years after death of the author or 90 years for companies is excessively long. It should be ~25 years. For software it should be 10 years.
And if AI companies want recent stuff, they need to pay the owners.
However, the West wants to infinitely enrich the lucky old people and companies who benefited from the lax regulations at the start of 20th century. Their people chose to not let the current generations to acquire equivalent wealth, at least not without the old hags get their cut too.
I think the jury is still out on how fair use applies to AI. Fair use was not designed for what we have now.
I could read a book, but its highly unlikely I could regurgitate it, much less months or years later. An LLM, however, can. While we can say "training is like reading", its also not like reading at all due to permanent perfect recall.
Not only does an LLM have perfect recall, it also has the ability to distribute plagiarized ideas at a scale no human can. There's a lot of questions to be answered about where fair use starts/ends for these LLM products.
Platform detects ad blockers, indicating privacy-related technical measures.
Inferences
Tracking implementation and ad detection suggest systematic collection of user behavior data without explicit consent visible on page.
ND
Article 13Freedom of Movement
Low Practice
No article content available.
Observable Facts
Content access restricted by subscription status.
Technical requirements (JS enabled) create barriers to information access.
Inferences
Platform architecture creates structural barriers to free movement and access of information.
ND
Article 14Asylum
ND
No article content available.
ND
Article 15Nationality
ND
No article content available.
ND
Article 16Marriage & Family
ND
No article content available.
ND
Article 17Property
ND
No article content available.
ND
Article 18Freedom of Thought
ND
No article content available.
ND
Article 19Freedom of Expression
Low Practice Coverage
No article content available for assessment of free expression advocacy.
Observable Facts
Article URL suggests journalism about technology/copyright issues (Anthropic settlement).
Content gated behind subscription/unlock codes and JavaScript requirements.
Platform employs ad blocker detection, limiting access for privacy-conscious users.
Inferences
Platform operates journalism business but restricts information distribution through access controls.
Structural model undermines universal right to receive information regardless of economic status or technical configuration.
ND
Article 20Assembly & Association
ND
No article content available.
ND
Article 21Political Participation
ND
No article content available.
ND
Article 22Social Security
ND
No article content available.
ND
Article 23Work & Equal Pay
ND
No article content available.
ND
Article 24Rest & Leisure
ND
No article content available.
ND
Article 25Standard of Living
ND
No article content available.
ND
Article 26Education
Low Practice
No article content available.
Observable Facts
Content requires paid subscription or unlock code.
Platform restricts free access to educational/informational journalism content.
Inferences
Subscription-based model undermines universal right to information and education for those unable to pay.
ND
Article 27Cultural Participation
ND
No article content available.
ND
Article 28Social & International Order
ND
No article content available.
ND
Article 29Duties to Community
ND
No article content available.
ND
Article 30No Destruction of Rights
ND
No article content available.
Structural Channel
What the site does
-0.15
Article 3Life, Liberty, Security
Low Practice
Structural
-0.15
Context Modifier
0.00
SETL
ND
Tracking systems and data collection for subscription/ad targeting purposes.
-0.20
Article 2Non-Discrimination
Low Practice
Structural
-0.20
Context Modifier
0.00
SETL
ND
Subscription paywall creates unequal access based on economic status. Certain users excluded from content access.
-0.20
Article 12Privacy
Low Practice
Structural
-0.20
Context Modifier
0.00
SETL
ND
Tracking parameters and ad blocker detection indicate privacy monitoring and behavioral data collection.
-0.25
PreamblePreamble
Low Practice
Structural
-0.25
Context Modifier
0.00
SETL
ND
Page requires JavaScript and detects ad blockers, creating access barriers. Subscription-gated model restricts universal access to information.
-0.25
Article 26Education
Low Practice
Structural
-0.25
Context Modifier
0.00
SETL
ND
Paywall model restricts universal access to information and content. Subscription requirement creates economic barriers to education and information access.
-0.30
Article 13Freedom of Movement
Low Practice
Structural
-0.30
Context Modifier
0.00
SETL
ND
Paywall and JavaScript requirements restrict movement of information and content access across user groups.
-0.35
Article 19Freedom of Expression
Low Practice Coverage
Structural
-0.35
Context Modifier
0.00
SETL
ND
Despite hosting technology journalism, platform uses commercial paywall model that restricts universal access to information. JavaScript requirement and ad blocker detection further limit information distribution.
ND
Article 1Freedom, Equality, Brotherhood
ND
No structural signals observable.
ND
Article 4No Slavery
ND
No structural signals observable.
ND
Article 5No Torture
ND
No structural signals observable.
ND
Article 6Legal Personhood
ND
No structural signals observable.
ND
Article 7Equality Before Law
ND
No structural signals observable.
ND
Article 8Right to Remedy
ND
No structural signals observable.
ND
Article 9No Arbitrary Detention
ND
No structural signals observable.
ND
Article 10Fair Hearing
ND
No structural signals observable.
ND
Article 11Presumption of Innocence
ND
No structural signals observable.
ND
Article 14Asylum
ND
No structural signals observable.
ND
Article 15Nationality
ND
No structural signals observable.
ND
Article 16Marriage & Family
ND
No structural signals observable.
ND
Article 17Property
ND
No structural signals observable.
ND
Article 18Freedom of Thought
ND
No structural signals observable.
ND
Article 20Assembly & Association
ND
No structural signals observable.
ND
Article 21Political Participation
ND
No structural signals observable.
ND
Article 22Social Security
ND
No structural signals observable.
ND
Article 23Work & Equal Pay
ND
No structural signals observable.
ND
Article 24Rest & Leisure
ND
No structural signals observable.
ND
Article 25Standard of Living
ND
No structural signals observable.
ND
Article 27Cultural Participation
ND
No structural signals observable.
ND
Article 28Social & International Order
ND
No structural signals observable.
ND
Article 29Duties to Community
ND
No structural signals observable.
ND
Article 30No Destruction of Rights
ND
No structural signals observable.
Event Timeline
20 events
2026-02-26 12:20
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 12:18
rate_limit
OpenRouter rate limited (429) model=llama-3.3-70b
--
2026-02-26 12:17
rate_limit
OpenRouter rate limited (429) model=llama-3.3-70b
--
2026-02-26 12:16
rate_limit
OpenRouter rate limited (429) model=llama-3.3-70b
--
2026-02-26 10:34
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:32
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:32
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:31
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:29
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:29
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:29
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:28
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:28
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:28
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:25
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:24
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:24
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
--
2026-02-26 10:24
credit_exhausted
Credit balance too low, retrying in 299s
--
2026-02-26 10:24
credit_exhausted
Credit balance too low, retrying in 254s
--
2026-02-26 10:23
dlq
Dead-lettered after 1 attempts: Anthropic agrees to pay $1.5B to settle lawsuit with book authors