-0.24 Anthropic agrees to pay $1.5B to settle lawsuit with book authors

Name: HRCB Evaluation: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
Item: Anthropic agrees to pay $1.5B to settle lawsuit with book authors
Rating: -0.243
Author: HN HRCB

H	HN HRCB top \| articles \| domains \| dashboard \| models \| factions \| about \| exp

home / www.nytimes.com / item 45142885

-0.24	Anthropic agrees to pay $1.5B to settle lawsuit with book authors (www.nytimes.com)
	989 points by acomjean 173 days ago \| 737 comments on HN \| Mild negative Editorial · v3.7 ·

Summary Information Access & Equity Undermines

The New York Times article content itself could not be assessed due to paywall and JavaScript barriers. Structurally, the platform demonstrates mixed signals: it hosts journalism potentially relevant to human rights issues (technology/copyright), but restricts universal information access through subscription requirements, technical gatekeeping, and tracking systems. The architecture undermines UDHR Articles on information access, equality, and privacy.

Article Heatmap

Negative Neutral Positive No Data

Aggregates

Weighted Mean	-0.24	Unweighted Mean	-0.24
Max	-0.15 Article 3	Min	-0.35 Article 19
Signal	7	No Data	24
Confidence	5%	Volatility	0.06 (Low)
Negative	7	Channels	E: 0.6 S: 0.4
SETL	ND
FW Ratio	64%	16 facts · 9 inferences

Evidence: High: 0 Medium: 0 Low: 7 No Data: 24

Theme Radar

HN Discussion 20 top-level · 30 replies

on_meds 2025-09-05 19:44 UTC link

It will be interesting to see how this impacts the lawsuits against OpenAI, Meta, and Microsoft. Will they quickly try to settle for billions as well?

It’s not precedent setting but surely it’ll have an impact.

aeon_ai 2025-09-05 20:46 UTC link

To be very clear on this point - this is not related to model training.

It’s important in the fair use assessment to understand that the training itself is fair use, but the pirating of the books is the issue at hand here, and is what Anthropic “whoopsied” into in acquiring the training data.

Buying used copies of books, scanning them, and training on it is fine.

Rainbows End was prescient in many ways.

arjunchint 2025-09-05 20:49 UTC link

Wait so they raised all that money just to give it to publishers?

Can only imagine the pitch, yes please give us billions of dollars. We are going to make a huge investment like paying of our lawsuits.

MaxikCZ 2025-09-05 20:51 UTC link

See kids? Its okay to steal if you steal more money than the fine costs.

petralithic 2025-09-05 20:54 UTC link

This is sad for open source AI, piracy for the purpose of model training should also be fair use because otherwise only the big companies who can afford to pay off publishers like Anthropic will be able to do so. There is no way to buy billions of books just for model training, it simply can't happen.

perihelions 2025-09-05 21:01 UTC link

https://archive.ph/wugNc

mhh__ 2025-09-05 21:06 UTC link

Maybe I would think differently if I was a book author but I can't help but think that this is ugly but actually quite good for humanity in some perverse sense. I will never, ever, read 99.9% of these books presumably but I will use claude.

novok 2025-09-05 21:09 UTC link

I wonder who will be the first country to make an exception to copyright law for model training libraries to attract tax revenue like Ireland did for tech companies in the EU. Japan is part of the way there, but you couldn't do a common crawl type thing. You could even make it a library of congress type of setup.

GodelNumbering 2025-09-05 21:22 UTC link

Settlement Terms (from the case pdf)

1. A Settlement Fund of at least $1.5 Billion: Anthropic has agreed to pay a minimum of $1.5 billion into a non-reversionary fund for the class members. With an estimated 500,000 copyrighted works in the class, this would amount to an approximate gross payment of $3,000 per work. If the final list of works exceeds 500,000, Anthropic will add $3,000 for each additional work.

2. Destruction of Datasets: Anthropic has committed to destroying the datasets it acquired from LibGen and PiLiMi, subject to any legal preservation requirements.

3. Limited Release of Claims: The settlement releases Anthropic only from past claims of infringement related to the works on the official "Works List" up to August 25, 2025. It does not cover any potential future infringements or any claims, past or future, related to infringing outputs generated by Anthropic's AI models.

r_lee 2025-09-05 21:28 UTC link

One thing that comes to mind is...

Is there a way to make your content on the web "licensed" in a way where it is only free for human consumption?

I.e. effectively making the use of AI crawlers pirating, thus subject to the same kind of penalties here?

GMoromisato 2025-09-05 22:12 UTC link

If you are an author here are a couple of relevant links:

You can search LibGen by author to see if your work is included. I believe this would make you a member of the class: https://www.theatlantic.com/technology/archive/2025/03/searc...

If you are a member of the class (or think you are) you can submit your contact information to the plaintiff's attorneys here: https://www.anthropiccopyrightsettlement.com/

KTaffer 2025-09-05 22:20 UTC link

This was a very tactical decision by Anthropic. They have just received Series F funding, and they can now afford to settle this lawsuit.

OpenAI and Google will follow soon now that the precedent has been set, and will likely pay more.

It will be a net win for Anthropic.

Taek 2025-09-05 23:36 UTC link

I can't help but feel like this is a huge win for Chinese AI. Western companies are going to be limited in the amount of data they can collect and train on, and Chinese (or any foreign AI) is going to have access to much more and much better data.

dataflow 2025-09-06 00:16 UTC link

How do legal penalties and settlements work internationally? Are entities in other countries somehow barred from filing similar suits with more penalties?

mNovak 2025-09-06 01:53 UTC link

Everything talks about settlement to the 'authors'; is that meant to be shorthand for copyright holders? Because there are a lot of academic works in that library where the publisher holds exclusive copyright and the author holds nothing.

By extension, if the big publishers are getting $3000 per article, that could be a fairly significant windfall.

unvritt 2025-09-06 08:47 UTC link

I think that one under-discussed effect for settlements like this is the additional tax on experimentation. The largest players can absorb a $1.5B hit or negotiate licensing at scale. Smaller labs and startups, which often drive breakthroughs, may not survive the compliance burden.

That could push the industry toward consolidation—fewer independent experiments, more centralized R&D inside big tech. I feel that, this might slow the pace of unexpected innovations and increase dependence on incumbents.

This def. raises the question: how do we balance fair compensation for creators with keeping the door open for innovation?

mbrochh 2025-09-06 10:11 UTC link

After their recent change in tune to retain data for longer and to train on our data, I deleted my account.

Try to do that. There is no easy way to delete your account. You need to reach out to their support via email. Incredibly obnoxious dark pattern. I hate OpenAI, but everything with Anthropic also smells fishy.

We need more and better players. I hope that XAi will give them all some good competition, but I have my doubts.

gordian-mind 2025-09-06 10:42 UTC link

After the book publishers burned Google Book's Library of Alexandria, they are now making it impossible to train a LLM unless you engage in the medieval process of manually buying paper-copies of work just to scan & destroy them...

codedokode 2025-09-06 13:41 UTC link

Is this legal: scan billions of pirated books, train a LLM on them and generate billion public domain books with it so that nobody ever needs copyrighted books anymore?

Also if there is a software library with annoying Stallman-style license, can one use LLM to generate a compatible library but in a public domain or with commercial license? So that nobody needs to respect software licenses anymore? Can we also generate a free Photoshop, Linux kernel and Windows this way?

scotty79 2025-09-06 16:49 UTC link

That's the worst AI news I read ever.

Even might AI with billions must kneel to copyright industry. We are forever doomed. Human culture will never be free from the grasp of rent seeking.

SlowTao 2025-09-05 20:44 UTC link

That was my first though. While not legal precedent, it does sort of open the flood gates for others.

typs 2025-09-05 20:45 UTC link

Maybe, though this lawsuit is different in respect to the piracy issue. Anthropic is paying the settlement because they pirated the books, not because training on copyrighted books isn’t fair use which isn’t necessarily true with the other cases.

nerevarthelame 2025-09-05 20:49 UTC link

Anthropic certainly seems to be hoping that their competitors will have to face some consequences too:

>During a deposition, a founder of Anthropic, Ben Mann, testified that he also downloaded the Library Genesis data set when he was working for OpenAI in 2019 and assumed this was “fair use” of the material.

Per the NYT article, Anthropic started buying physical books in bulk and scanning them for their training data, and they assert that no pirated materials were ever used in public models. I wonder if OpenAI can say the same.

non_aligned 2025-09-05 20:52 UTC link

You're joking, but that's actually a good pitch. There was a significant legal issue hanging over their heads, with some risk of a potentially business-ending judgment down the line. This makes it go away, which makes the company a safer, more valuable investment. Both in absolute terms and compared to peers who didn't settle.

Wowfunhappy 2025-09-05 20:53 UTC link

From the article:

> Although the payment is enormous, it is small compared with the amount of money that Anthropic has raised in recent years. This month, the start-up announced that it had agreed to a deal that brings an additional $13 billion into Anthropic’s coffers. The start-up has raised a total of more than $27 billion since its founding in 2021.

lewdwig 2025-09-05 20:55 UTC link

I’m sure this’ll be misreported and wilfully misinterpreted because of the current fractious state of the AI discourse, but given the lawsuit was to do with piracy, not the copyright-compliance of LLMs, and in any case, given they settled out of court, thus presumably admit no wrongdoing, conveniently no legal precedent is established either way.

I would not be surprised if investors made their last round of funding contingent on settling this matter out of court precisely to ensure no precedents are set.

varenc 2025-09-05 20:57 UTC link

> Rainbows End was prescient in many ways.

Agreed. Great book for those looking for a read: https://www.goodreads.com/book/show/102439.Rainbows_End

The author, Vernor Vinge, is also responsible for popularizing the term 'singularity'.

dbalatero 2025-09-05 20:58 UTC link

This implies training models is some sort of right.

freejazz 2025-09-05 21:22 UTC link

They wanted to move fast and break things. No one made them.

sefrost 2025-09-05 21:22 UTC link

I wonder how much it would cost to buy every book that you'd want to train a model.

privatelypublic 2025-09-05 21:27 UTC link

Don't forget: NO LEGAL PRECEDENT! which means, anybody suing has to start all over. You only settle in this scenario/point if you think you'll lose.

Edit: I'll get ratio'd for this- but its the exact same thing google did in it's lawsuit with Epic. They delayed while the public and courts focused in apple (oohh, EVIL apple)- apple lost, and google settled at a disadvantage before they had a legal judgment that couldn't be challenged latter.

tonfa 2025-09-05 21:30 UTC link

As long as you're not distributing, it's legal in Switzerland to download copyrighted material. (Switzerland was on the naughty US/MPAA list for a while, might still be)

manbash 2025-09-05 21:31 UTC link

Thank you. I assumed it would be quicker to find the link to the case PDF here, but your summary is appreciated!

Indeed, it is not only payout, but the destruction of the datasets. Although the article does quote:

> “Anthropic says it did not even use these pirated works,” he said. “If some other generative A.I. company took data from pirated source and used it to train on and commercialized it, the potential liability is enormous. It will shake the industry — no doubt in my mind.”

Even if true, I wonder how many cases we will see in the near future.

ajross 2025-09-05 21:31 UTC link

That metaphor doesn't really work. It's a settlement, not a punishment, and this is payment, not a fine. Legally it's more like "The store wasn't open, so I took the items from the lot and paid them later".

It's not the way we expect people to do business under normal circumstances, but in new markets with new products? I guess I don't see much actually wrong with this. Authors still get paid a price they were willing to accept, and Anthropic didn't need to wait years to come to an agreement (again, publishers weren't actually selling what AI companies needed to buy!) before training their LLMs.

bcrosby95 2025-09-05 21:33 UTC link

Fair use isn't about how you access the material, its about what you can do with it after you legally access it. If you don't legally access it, the question of fair use is moot.

gooosle 2025-09-05 21:33 UTC link

So... it would be a lot cheaper to just buy all of the books?

ascorbic 2025-09-05 21:33 UTC link

They're paying $3000 per book. It would've been a lot cheaper to buy the books (which is what they actually did end up doing too).

Wowfunhappy 2025-09-05 21:36 UTC link

I'd argue you don't actually want this! You're suggesting companies should be able to make web scraping illegal.

That curl script you use to automate some task could become infringing.

shadowgovt 2025-09-05 21:41 UTC link

I'm sure one can try, but copyright has all kinds of oddities and carve-outs that make this complicated. IANAL, but I'm fairly certain that, for example, if you tried putting in your content license "Free for all uses public and private, except academia, screw that ivory tower..." that's a sentiment you can express but universities are under no obligation legally to respect your wish to not have your work included in a course presentation on "wild things people put in licenses." Similarly, since the court has found that training an LLM on works is transformative, a license that says "You may use this for other things but not to train an LLM" couldn't be any more enforceable than a musician saying "You may listen to my work as a whole unit but God help you if I find out you sampled it into any of that awful 'rap music' I keep hearing about..."

The purpose of the copyright protections is to promote "sciences and useful arts," and the public utility of allowing academia to investigate all works(1) exceeds the benefits of letting authors declare their works unponderable to the academic community.

(1) And yet, textbooks are copyrighted and the copyright is honored; I'm not sure why the academic fair-use exception doesn't allow scholars to just copy around textbooks without paying their authors.

Cheer2171 2025-09-05 21:42 UTC link

No. Neither legally or technically possible.

mdp2021 2025-09-05 21:48 UTC link

> Buying used copies of books

It remains deranged.

Everyone has more than a right to freely have read everything is stored in a library.

(Edit: in fact initially I wrote 'is supposed to' in place of 'has more than a right to' - meaning that "knowledge is there, we made it available: you are supposed to access it, with the fullest encouragement").

7952 2025-09-05 22:02 UTC link

Maybe some kind of captcha like system could be devised that could be considered a security measure under the DMCA and not allowed to be circumvented. Make the same content available under a licence fee through an API.

gpm 2025-09-05 22:10 UTC link

Yes to the first part. Put your site behind a login wall that requires users to sign a contract to that effect before serving them the content... get a lawyer to write that contract. Don't rely on copyright.

I'm not sure to what extent you can specify damages like these in a contract, ask the lawyer who is writing it.

jongjong 2025-09-05 22:42 UTC link

Isn't that how the whole system operates? Everyone is a conduit to allow rich people to enrich themselves further. The amount and quality of opportunities any individual receives are proportional to how well it serves existing capital.

So long as there is an excuse to justify money flows, that's fine, big capital doesn't really care about the excuse; so long as the excuse is just persuasive enough to satisfy the regulators and the judges.

Money flows happen independently, then later, people try to come up with good narratives. This is exactly what happened in this case. They paid the authors a lot of money as a settlement and agreed on a narrative which works for both sets of people; that training was fine, it's the pirating which was a problem...

It's likely why they settled; they preferred to pay a lot of money and agree on some false narrative which works for both groups rather than setting a precedent that AI training on copyrighted material is illegal; that would be the biggest loss for them.

pier25 2025-09-05 23:11 UTC link

Only 500,000 copyrighted works?

I was under the impression they had downloaded millions of books.

rchaud 2025-09-05 23:15 UTC link

> Buying used copies of books, scanning them, and training on it is fine.

But nobody was ever going to that, not when there are billions in VC dollars at stake for whoever moves fastest. Everybody will simply risk the fine, which tends to not be anywhere close to enough to have a deterrent effect in the future.

That is like saying Uber would have not had any problems if they just entered into a licensing contract with taxi medallion holders. It was faster to just put unlicensed taxis on the streets and use investor money to pay fines and lobby for favorable legislation. In the same way, it was faster for Anthropic to load up their models with un-DRM'd PDFs and ePUBs from wherever instead of licensing them publisher by publisher.

gnabgib 2025-09-05 23:16 UTC link

To be even more clear - this is a settlement, it does not establish precedent, nor admit wrongdoing. This does not establish that training is fair use, nor that scanning books is fine. That's somebody else's battle.

okanat 2025-09-06 00:53 UTC link

The West can end the endless pain and legal hurdles to innovation by limiting the copyright. They can do it if there is will to open up the gates of information to everyone. The duration of 70 years after death of the author or 90 years for companies is excessively long. It should be ~25 years. For software it should be 10 years.

And if AI companies want recent stuff, they need to pay the owners.

However, the West wants to infinitely enrich the lucky old people and companies who benefited from the lax regulations at the start of 20th century. Their people chose to not let the current generations to acquire equivalent wealth, at least not without the old hags get their cut too.

amradio1989 2025-09-06 01:54 UTC link

I think the jury is still out on how fair use applies to AI. Fair use was not designed for what we have now.

I could read a book, but its highly unlikely I could regurgitate it, much less months or years later. An LLM, however, can. While we can say "training is like reading", its also not like reading at all due to permanent perfect recall.

Not only does an LLM have perfect recall, it also has the ability to distribute plagiarized ideas at a scale no human can. There's a lot of questions to be answered about where fair use starts/ends for these LLM products.

burningion 2025-09-06 02:09 UTC link

Thank you for posting this!

I suspected my work was in the dataset and it looks like it is! I reached out via the form.

Editorial Channel

What the content says

Preamble Preamble

Low Practice

No article content available for assessment.

Article 1 Freedom, Equality, Brotherhood

No article content available.

Article 2 Non-Discrimination

Low Practice

No article content available.

Article 3 Life, Liberty, Security

Low Practice

No article content available.

Article 4 No Slavery

No article content available.

Article 5 No Torture

No article content available.

Article 6 Legal Personhood

No article content available.

Article 7 Equality Before Law

No article content available.

Article 8 Right to Remedy

No article content available.

Article 9 No Arbitrary Detention

No article content available.

Article 10 Fair Hearing

No article content available.

Article 11 Presumption of Innocence

No article content available.

Article 12 Privacy

Low Practice

No article content available.

Article 13 Freedom of Movement

Low Practice

No article content available.

Article 14 Asylum

No article content available.

Article 15 Nationality

No article content available.

Article 16 Marriage & Family

No article content available.

Article 17 Property

No article content available.

Article 18 Freedom of Thought

No article content available.

Article 19 Freedom of Expression

Low Practice Coverage

No article content available for assessment of free expression advocacy.

Article 20 Assembly & Association

No article content available.

Article 21 Political Participation

No article content available.

Article 22 Social Security

No article content available.

Article 23 Work & Equal Pay

No article content available.

Article 24 Rest & Leisure

No article content available.

Article 25 Standard of Living

No article content available.

Article 26 Education

Low Practice

No article content available.

Article 27 Cultural Participation

No article content available.

Article 28 Social & International Order

No article content available.

Article 29 Duties to Community

No article content available.

Article 30 No Destruction of Rights

No article content available.

Structural Channel

What the site does

-0.15

Article 3 Life, Liberty, Security

Low Practice

Structural

-0.15

Context Modifier

0.00

SETL

Tracking systems and data collection for subscription/ad targeting purposes.

-0.20

Article 2 Non-Discrimination

Low Practice

Structural

-0.20

Context Modifier

0.00

SETL

Subscription paywall creates unequal access based on economic status. Certain users excluded from content access.

-0.20

Article 12 Privacy

Low Practice

Structural

-0.20

Context Modifier

0.00

SETL

Tracking parameters and ad blocker detection indicate privacy monitoring and behavioral data collection.

-0.25

Preamble Preamble

Low Practice

Structural

-0.25

Context Modifier

0.00

SETL

Page requires JavaScript and detects ad blockers, creating access barriers. Subscription-gated model restricts universal access to information.

-0.25

Article 26 Education

Low Practice

Structural

-0.25

Context Modifier

0.00

SETL

Paywall model restricts universal access to information and content. Subscription requirement creates economic barriers to education and information access.

-0.30

Article 13 Freedom of Movement

Low Practice

Structural

-0.30

Context Modifier

0.00

SETL

Paywall and JavaScript requirements restrict movement of information and content access across user groups.

-0.35

Article 19 Freedom of Expression

Low Practice Coverage

Structural

-0.35

Context Modifier

0.00

SETL

Despite hosting technology journalism, platform uses commercial paywall model that restricts universal access to information. JavaScript requirement and ad blocker detection further limit information distribution.

Article 1 Freedom, Equality, Brotherhood