+0.41 We Are Changing Our Developer Productivity Experiment Design

Name: HRCB Evaluation: We Are Changing Our Developer Productivity Experiment Design
Item: We Are Changing Our Developer Productivity Experiment Design
Rating: 0.415
Author: HN HRCB

Y	HN HRCB new \| past \| comments \| ask \| show \| jobs \| articles \| domains \| dashboard \| seldon \| network \| factions \| velocity \| about hrcb

home / metr.org / item 47142078

+0.41	We Are Changing Our Developer Productivity Experiment Design (metr.org)
	51 points by ej88 7 hours ago \| 32 comments on HN \| Moderate positive Editorial · vv3.4 · 2026-02-25

Article Heatmap

Negative Neutral Positive No Data

Aggregates

Weighted Mean	+0.41	Unweighted Mean	+0.35
Max	+0.86 Article 19	Min	+0.10 Article 4
Signal	31	No Data	0
Confidence	37%	Volatility	0.18 (Medium)
Negative	0	Channels	E: 0.6 S: 0.4
SETL	+0.11	Editorial-dominant

Evidence: High: 3 Medium: 7 Low: 21 No Data: 0

Theme Radar

Domain Context Profile

Element	Modifier	Affects	Note
Privacy	—		No privacy policy or data handling practices visible on URL; domain context insufficient for assessment
Terms of Service	—		No ToS visible on URL
Accessibility	+0.05	Article 26	Basic semantic HTML structure with navigation menus present; no explicit accessibility features documented on page
Mission	+0.15	Article 27 Article 28	Mission statement describes scientific measurement of AI catastrophic harm; implies commitment to research integrity and public interest
Editorial Code	+0.10	Article 19	Publication demonstrates transparent reporting of methodological flaws, data limitations, and selection bias; exhibits editorial integrity in correcting previous findings
Ownership	—		Nonprofit research organization; no conflicts of interest evident on page
Access Model	+0.10	Article 19 Article 27	Content freely accessible; datasets made publicly available; supports open access to research
Ad/Tracking	—		No advertising or tracking detected on page

HN Discussion 10 top-level · 0 replies

ej88 2026-02-24 20:07 UTC link

Really interesting updates to their 2025 experiment.

Repeat devs from the original experiment went from 0-40% slowdown to now -10-40% speedup - and METR estimates this as a 'lower-bound'

more devs saying they dont even want to do 50% of their work without AI, even for 50/hr

30-50% of devs decided not to submit certain tasks without AI, missing the tasks with the highest uplift

it also seems like there is a skill gap - repeat devs from the first study are more productive with ai tools than newly recruited ones with variable experience

overall it seems like the high preference for devs to use AI is actually hurting METR's ability to judge their speedup, due to a refusal to do tasks without it. imo this is indirectly quite supportive for ai coding's productivity claims.

softwaredoug 2026-02-24 20:21 UTC link

I'm a bit perplexed by the developer selection effects.

I get that developers want to use AI. But are they also claiming there's not still a no/low-AI population of developers? Or that their means of selection don't find these developers?

Are they worried that by splitting devs into groups of AI experience they might be measuring some confounder that causes people to choose AI / not AI in their careers?

sgillen 2026-02-24 22:12 UTC link

This is very interesting because I see a lot of AI detractors point to the original study as proof that AI is overhyped and nothing to worry about. In this new study the findings are essentially reversed (20% slowdown to 20% speedup).

arctic-true 2026-02-24 22:57 UTC link

Those developer quotes are tough to read. Rate limits are going to hit like a truck when the labs eventually need to make a profit.

camgunz 2026-02-24 23:00 UTC link

Unless this measures the entire SDLC longitudinally (like say, over a year) I'm not interested. I too can tell Claude Code to do things all day every day, but unless we have data on the defect rate it doesn't matter at all.

Bnjoroge 2026-02-24 23:12 UTC link

never been a better time to be a swe who doesnt or significantly limits the use of AI agents

atleastoptimal 2026-02-24 23:17 UTC link

It's kind of funny that METR is known primarily for both the most bearish study on AI progress (the original 20% slowdown one), and the most bullish one on AI progress (the long-task horizon study showing exponential increase in duration of tasks AI models can accomplish with respect to date of release).

In either case, it seems people ended up bolstering their preexisting views on AI based on whichever study most affirmed them (for the former, that AI coding models didn't actually help and created a mirage of productivity that required more work to fix than was worth it, the latter that AI models were improving at an exponential rate and will invariably eclipse SWE's in all tasks in a deterministic amount of time.)

I think the truth is somewhere in the middle. Just anecdotally we've seen multi-million dollar fortunes being minted by small teams developing using 90% AI-assisted coding. Anthropic claims they solely use agents to code and don't modify any code manually.

daxfohl 2026-02-24 23:41 UTC link

"I don't want to do this without AI" sounds like we're already well into the brain atrophy stage of this. Now what? (I'd think about it myself but....)

keeda 2026-02-25 00:26 UTC link

> When surveyed, 30% to 50% of developers told us that they were choosing not to submit some tasks because they did not want to do them without AI. This implies we are systematically missing tasks which have high expected uplift from AI.

In fact, one of the developers in the original study later revealed on Twitter that he had already done exactly that during the study, i.e. filtered out tasks he prefered not to do without AI: https://xcancel.com/ruben_bloom/status/1943536052037390531

While this was only one developer (that we know of), given the N was 16 and he seems to have been one of the more AI-experienced devs, this could have had a non-trivial effect on the results.

The original study gets a lot of air-time from AI naysayers, let's see how much this follow-up gets ;-)

tonymet 2026-02-25 01:37 UTC link

> "AI tools lead to worse productivity"

> The subjects are using ChatGPT 2.5 and copy-pasting code.

The reason AI hype seems to be so bipolar is that "AI" isn't one thing. Hundreds of models, dozens of tools. And to get something done well, a seasoned engineer needs to master half a dozen at a time.

Score Breakdown

+0.57

Preamble Preamble

High A: commitment to human dignity through AI safety research F: framing research as essential public good

Editorial

+0.50

Structural

+0.30

SETL

+0.32

Combined

Context Modifier

Editorial emphasis on scientific rigor and public benefit aligns with Preamble values of equality and dignity. Structural openness via dataset publication supports collaborative advancement of human welfare.

+0.46

Article 1 Freedom, Equality, Brotherhood

Medium A: research on AI impact on human autonomy and capability F: treating developers as research subjects with agency

Editorial

+0.40

Structural

+0.30

SETL

+0.20

Combined

Context Modifier

Content demonstrates recognition of inherent equality by documenting participant consent issues and selection bias. Developer autonomy respected through voluntary participation framework.

+0.26

Article 2 Non-Discrimination

Low F: no discrimination in study design; diverse developer recruitment

Editorial

+0.30

Structural

+0.20

SETL

+0.17

Combined

Context Modifier

Study recruited from diverse set of open-source projects. Limited editorial discussion of non-discrimination principles; primarily descriptive of methodology.

+0.20

Article 3 Life, Liberty, Security

Low

Editorial

+0.20

Structural

+0.20

SETL

0.00

Combined

Context Modifier

No explicit discussion of right to life or security. Research focuses on productivity, not security or survival concerns.

+0.10

Article 4 No Slavery

Low

Editorial

+0.10

Structural

+0.10

SETL

0.00

Combined

Context Modifier

No discussion of slavery or servitude. Topic not directly relevant to content.

+0.20

Article 5 No Torture

Low

Editorial

+0.20

Structural

+0.20

SETL

0.00

Combined

Context Modifier

No discussion of torture or cruel treatment. Not addressed in content.

+0.26

Article 6 Legal Personhood

Low P: participants engaged in voluntary research

Editorial

+0.30

Structural

+0.20

SETL

+0.17

Combined

Context Modifier

Participant recruitment suggests recognition of legal personality. Limited substantive discussion of principle.

+0.46

Article 7 Equality Before Law

Medium P: equal treatment of developers in study design A: recognition of fairness issues in study methodology

Editorial

+0.40

Structural

+0.30

SETL

+0.20

Combined

Context Modifier

Editorial discussion of selection bias and unequal treatment. Acknowledgment that differential pay ($150 vs $50/hr) created fairness problems. Demonstrates equality principle awareness.

+0.30

Article 8 Right to Remedy

Low P: data publication provides recourse mechanism

Editorial

+0.30

Structural

+0.30

SETL

0.00

Combined

Context Modifier

Dataset availability provides some transparency for remedy. Limited discussion of legal recourse or justice mechanisms.

+0.10

Article 9 No Arbitrary Detention

Low

Editorial

+0.10

Structural

+0.10

SETL

0.00

Combined

Context Modifier

No discussion of arbitrary arrest or detention. Not relevant to content.

+0.46

Article 10 Fair Hearing

Medium A: transparent public reporting of study methodology and limitations P: open dataset enables independent review

Editorial

+0.40

Structural

+0.30

SETL

+0.20

Combined

Context Modifier

Publication demonstrates commitment to fair hearing through methodology transparency. Data release enables independent verification and contestation of findings.

+0.20

Article 11 Presumption of Innocence

Low

Editorial

+0.20

Structural

+0.20

SETL

0.00

Combined

Context Modifier

No discussion of presumption of innocence. Not applicable to research context.

+0.26

Article 12 Privacy

Low P: participant data collected under study protocol

Editorial

+0.20

Structural

+0.30

SETL

-0.17

Combined

Context Modifier

No explicit discussion of privacy protections. Content does not address arbitrary interference with privacy, family, or correspondence.

+0.56

Article 13 Freedom of Movement

Medium P: developers free to participate in open-source projects studied A: research conducted on existing developer communities

Editorial

+0.50

Structural

+0.40

SETL

+0.22

Combined

Context Modifier

Study conducted in context of voluntary open-source participation. Dataset publicly released supports freedom of movement within research community.

+0.26

Article 14 Asylum

Low

Editorial

+0.30

Structural

+0.20

SETL

+0.17

Combined

Context Modifier

No discussion of asylum or refuge. Not applicable to content.

+0.26

Article 15 Nationality

Low

Editorial

+0.30

Structural

+0.20

SETL

+0.17

Combined

Context Modifier

No discussion of nationality. Not addressed in research context.

+0.20

Article 16 Marriage & Family

Low

Editorial

+0.20

Structural

+0.20

SETL

0.00

Combined

Context Modifier

No discussion of marriage or family rights. Not relevant to content.

+0.30

Article 17 Property

Low P: researchers own intellectual work through open publication

Editorial

+0.30

Structural

+0.30

SETL

0.00

Combined

Context Modifier

Public dataset and publication supports ownership of research findings. Limited substantive engagement with property rights principles.

+0.26

Article 18 Freedom of Thought

Low

Editorial

+0.30

Structural

+0.20

SETL

+0.17

Combined

Context Modifier

No discussion of freedom of thought, conscience, or religion. Not addressed in research content.

+0.86

Article 19 Freedom of Expression

High A: transparent reporting of methodological flaws and limitations F: editorial integrity in correcting prior findings P: open access to raw datasets and methodology P: public discussion of research limitations

Editorial

+0.70

Structural

+0.60

SETL

+0.26

Combined

Context Modifier

Exemplary execution of freedom to seek and impart information. Editorial demonstrates commitment to transparency, candor about limitations, and public accountability. Dataset release and detailed methodology documentation support Article 19 principles. Context modifiers: editorial_code +0.1, access_model +0.1.

+0.36

Article 20 Assembly & Association

Low P: researchers organized in nonprofit research entity

Editorial

+0.40

Structural

+0.30

SETL

+0.20

Combined

Context Modifier

Organization structured as nonprofit enables collective action on research mission. Limited explicit discussion of freedom of assembly.

+0.26

Article 21 Political Participation

Low

Editorial

+0.30

Structural

+0.20

SETL

+0.17

Combined

Context Modifier

No discussion of democratic participation or voting. Not addressed in research context.

+0.55

Article 22 Social Security

Medium A: research on AI impact to social and economic welfare P: nonprofit mission directed at public benefit

Editorial

+0.40

Structural

+0.40

SETL

0.00

Combined

Context Modifier

Research on AI productivity impacts directly concerns social welfare and economic well-being. Nonprofit structure and public dataset release support realization of social rights. Context modifier: mission +0.15.

+0.46

Article 23 Work & Equal Pay

Medium P: participants paid for labor participation F: study examines work and employment conditions with AI

Editorial

+0.50

Structural

+0.40

SETL

+0.22

Combined

Context Modifier

Participants compensated for labor contribution ($50/hr). Editorial notes problems with pay disparity between studies and selection effects from pay reduction, demonstrating engagement with work rights principles.

+0.26

Article 24 Rest & Leisure

Low

Editorial

+0.30

Structural

+0.20

SETL

+0.17

Combined

Context Modifier

No discussion of rest, leisure, or holidays. Not addressed in content.

+0.20

Article 25 Standard of Living

Low

Editorial

+0.20

Structural

+0.20

SETL

0.00

Combined

Context Modifier

No discussion of healthcare, food, or basic welfare. Not central to research focus.

+0.35

Article 26 Education

Low P: content freely accessible online P: basic semantic structure enables broader access

Editorial

+0.30

Structural

+0.30

SETL

0.00

Combined

Context Modifier

Open access to published research and datasets supports education principle. Minimal accessibility features documented. Context modifier: accessibility +0.05.

+0.81

Article 27 Cultural Participation

High A: research on AI impact to human scientific advancement P: open dataset and publication enable shared scientific progress F: mission oriented to measurement of AI capabilities

Editorial

+0.60

Structural

+0.50

SETL

+0.24

Combined

Context Modifier

Research directly contributes to scientific and cultural advancement of humanity. Public dataset and open methodology support participation in scientific community. Context modifiers: mission +0.15, access_model +0.1.

+0.61

Article 28 Social & International Order

Medium A: research on social order supporting human rights P: nonprofit structure enables social order advancement

Editorial

+0.50

Structural

+0.40

SETL

+0.22

Combined

Context Modifier

AI safety research contributes to social order required for rights realization. Nonprofit mission aligned with public benefit. Context modifier: mission +0.15.

+0.30

Article 29 Duties to Community

Low P: research framed within bounds of scientific investigation

Editorial

+0.30

Structural

+0.30

SETL

0.00

Combined

Context Modifier

Research conducted within ethical bounds of voluntary participation. Limited explicit discussion of duties and responsibilities framework.

+0.20

Article 30 No Destruction of Rights

Low

Editorial

+0.20

Structural

+0.20

SETL

0.00

Combined

Context Modifier

No discussion of interpretation or application of UDHR. Not addressed in content.

build f581ea9+b3nz · 2026-02-25 03:04 UTC