+0.07 LLMs can be absolutely exhausting

Name: HRCB Evaluation: LLMs can be absolutely exhausting
Item: LLMs can be absolutely exhausting
Rating: 0.033
Author: Human Rights Observatory

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

Model: @cf/meta/llama-4-scout-17b-16e-instruct lite ND @cf/meta/llama-4-scout-17b-16e-instruct lite 0.00 claude-haiku-4-5-20251001 +0.07 @cf/meta/llama-3.3-70b-instruct-fp8-fast lite ND @cf/meta/llama-3.3-70b-instruct-fp8-fast lite 0.00 Compare

+0.07	LLMs can be absolutely exhausting (tomjohnell.com S:+0.07 )
	348 points by tjohnell 8 days ago \| 213 comments on HN \| Neutral High agreement (3 models) Editorial · v3.7 · 2026-03-16 00:04:43 0

Summary Labor Rights & Digital Wellness Advocates

This technical blog post advocates for sustainable work practices with AI tools, emphasizing worker agency, rest, and mental health awareness. The author frames technical skill development through ethics of self-care, fatigue recognition, and deliberate problem-solving—implicitly advancing rights to fair labor conditions (Article 23), rest and leisure (Article 24), health and wellbeing (Article 25), and education (Article 26). However, the site implements privacy-invasive analytics tracking without visible consent mechanisms, creating a structural tension between the content's wellness advocacy and platform practices that monitor user behavior.

Rights Tensions 2 pairs

Art 12 ↔ Art 19 — Content advocates free expression while site infrastructure implements tracking that collects user behavior data without visible consent, subordinating privacy rights to engagement monitoring practices.

Art 23 ↔ Art 12 — Content advocates fair labor and rest rights while site analytics track user engagement patterns, potentially enabling exploitative labor surveillance practices the article argues against.

Article Heatmap

Negative Neutral Positive No Data

Aggregates

+0.07

Weighted Mean	+0.03	Unweighted Mean	+0.05
Max	+0.28 Article 19	Min	-0.28 Article 12
Signal	3	No Data	28
Volatility	0.24 (Medium)
Negative	1	Channels	E: 0.6 S: 0.4
SETL ℹ	+0.02	Editorial-dominant
FW Ratio ℹ	57%	51 facts · 39 inferences
Agreement	High	3 models · spread ±0.054

Evidence 59% coverage ℹ

 4H  10M  3L  1 ND 

Theme Radar

HN Discussion 20 top-level · 18 replies

chalupa-supreme 2026-03-15 23:35 UTC link

I wanna say that it is indeed a “skill issue” when it comes to debugging and getting the agent in your editor of choice to move forward. Sometimes it takes an instruction to step back and evaluate the current state and others it’s about establishing the test cases.

I think the exhausting part is more probably more tied to the evaluation of the work the agent is doing, understanding its thought process and catching the hang up can be tedious in the current state of AI reasoning.

cglan 2026-03-15 23:37 UTC link

I find LLMs so much more exhausting than manual coding. It’s interesting. I think you quickly bump into how much a single human can feasibly keep track of pretty fast with modern LLMs.

I assume until LLMs are 100% better than humans in all cases, as long as I have to be in the loop there will be a pretty hard upper bound on what I can do and it seems like we’ve roughly hit that limit.

Funny enough, I get this feeling with a lot of modern technology. iPhones, all the modern messaging apps, etc make it much too easy to fragment your attention across a million different things. It’s draining. Much more draining than the old days

simonw 2026-03-15 23:40 UTC link

I wonder if it's more or less tiring to work with LLMs in YOLO/--dangerously-skip-permissions mode.

I mostly use YOLO mode which means I'm not constantly watching them and approving things they want to do... but also means I'm much more likely to have 2-3 agent sessions running in parallel, resulting in constant switching which is very mentally taxing.

veryfancy 2026-03-15 23:45 UTC link

In agent-mode mode, IMO, the sweet spot is 2-3 concurrent tasks/sessions. You don’t want to sit waiting for it, but you don’t want to context-switch across more than a couple contexts yourself.

anthonySs 2026-03-15 23:53 UTC link

llms aren’t exhausting it’s the hype and all the people around it

same thing happened with crypto - the underlying technology is cool but the community is what makes it so hated

rednafi 2026-03-15 23:59 UTC link

I have always enjoyed the feeling of aporia during coding. Learning to embrace the confusion and the eventual frustration is part of the job. So I don’t mind running in a loop alongside an agent.

But I absolutely loathe reviewing these generated PRs - more so when I know the submitter themselves has barely looked at the code. Now corporate has mandated AI usage and is asking people to do 10k LOC PRs every day. Reviewing this junk has become exhausting.

I don’t want to read your code if you haven’t bothered to read it yourselves. My stance is: reviewing this junk is far more exhausting. Coding is actually the fun part.

siliconc0w 2026-03-16 00:02 UTC link

I mostly do 2-3 agents yoloing with self "fresh eyes" review

jeremyjh 2026-03-16 00:11 UTC link

Most people reading this have probably had the experience of wasting hours debugging when exhausted, only to find it was a silly issue you’ve seen multiple times, or maybe you solve it in a few minutes the next morning.

Working with an agent coding all day can be exhilarating but also exhausting - maybe it’s because consequential decisions are packed more tightly together. And yes cognition still matters for now.

193572 2026-03-16 00:15 UTC link

It looks like Stockholm syndrome or a traditional abusive relationship 100 years ago where the woman tries to figure out how to best prompt her husband to do something.

You know you can leave abusive relationships. Ditch the clanker and free your mind.

somewhereoutth 2026-03-16 00:23 UTC link

Of course. Any scenario where you are expected to deliver results using non-deterministic tooling is going to be painful and exhausting. Imagine driving a car that might dive one way or the other of its own accord, with controls that frequently changed how they worked. At the end of any decently sized journey you would be an emotional wreck - perhaps even an actual wreck.

razorbeamz 2026-03-16 00:24 UTC link

LLMs do not actually make anything better for anyone. You have to constantly correct them. It's like having a junior coder under your wing that never learns from its mistakes. I can't imagine anyone actually feeling productive using one to work.

olejorgenb 2026-03-16 00:57 UTC link

I find working more asynchronous with the agents help. I've disabled the in-your-face agent-is-done/need-input notifications [1]. I work across a few different tasks at my own pace. It works quite well, and when/if I find a rhythm to it, it's absolutely less intense than normal programming.

You might think that the "constant" task switching is draining, but I don't switch that frequently. Often I keep the main focus on one task and use the waiting time to draft some related ideas/thoughts/next prompt. Or browse through the code for light review/understanding. It also helps to have one big/complex task and a few simpler things concurrently. And since the number of details required to keep "loaded" in your head per task is fewer, switching has less cost I think. You can also "reload" much quicker by simply chatting with the agent for a minute or two, if some detail have faded.

I think a key thing is to NOT chase after keeping the agents running at max efficiency. It's ok to let them be idle while you finish up what your doing. (perhaps bad of KV cache efficiency though - I'm not sure how long they keep the cache)

(And obviously you should run the agent in a sandbox to limit how many approvals you need to consider)

[1] I use the urgent-window hint to get a subtle hint of which workspace contain an agent ready for input.

EDIT: disclaimer - I'm relative new to using them, and have so far not used them for super complex tasks.

quantum_state 2026-03-16 01:03 UTC link

It seems to me that LLM is a tool after all. One needs to learn to use it effectively.

sigbottle 2026-03-16 01:11 UTC link

I am rewriting an agent framework from scratch because another agent framework, combined with my prompting, led to 2023-level regressions in alignment (completely faking tests, echoing "completed" then validating the test by grepping for the string "completed", when it was supposed to bootstrap a udp tunnel over ssh for that test...).

Many top labs [1] [2] already have heavily automated code review already and it's not slowing down. That doesn't mean I'm trusting everything blindly, but yes, over time, it should handle less and less "lower level" tasks and it's a good thing if it can.

[1] https://openai.com/index/harness-engineering/ [2] https://claude.com/blog/code-review

Further I want to vent about two things:

- Things can be improved.

- You are allowed to complain about anything, while not improving things yourself.

I think the mid 2010s really popularized self improvement in a way that you can't really argue with (if you disagree with "put in more effort and be more focused", you're obviously just lazy!). It's funny because the point of engineering is to find better solutions, but technically yes, an always valid solution is just "suck it up".

But moreover, if you do not allow these two premises, what ends up happening in practice for a lot of people, is that basically you can just interpret any slightly pushback as "oh they're just a whiner", and if they're not doing something to fix their problem this instant, that "obviously" validates your claim (and even if they are, it doesn't count, they should still not be a "debbie downer", etc.).

Sometimes a premise can sound extreme, but people forget that premises are not in a complete logical vaccuum, you actually live out and believe said premises, and by taking on a certain position, it's often more about what follows downstream from the behavior than the actual words themselves.

otterley 2026-03-16 01:16 UTC link

One way to help, I think, is to take advantage of prompt libraries. Claude makes this easy via Skills (which can be augmented via Plugins). Since skills themselves are just plain text with some front matter, they're easy to update and improve, and you can reuse them as much as you like.

There's probably a Codex equivalent, but I don't know what it is.

stainlu 2026-03-16 01:55 UTC link

The article frames this as a skill issue, but there's a more structural problem underneath: LLMs fundamentally shift work from generation to verification, and those two activities have very different cognitive profiles.

Writing code is flow-state compatible. You build up a mental model, hold it in working memory, and produce output in a continuous stream. Reviewing LLM output requires constant context-switching between "what does this code do" and "is this what I actually wanted." That's cognitively expensive in a way that no amount of prompt engineering fixes.

The exhaustion isn't primarily from bad prompts or slow feedback loops. It's that the LLM turned you from a writer into a full-time reviewer, and reviewing is harder work per unit of output than writing. The productivity gain is real, but the effort doesn't feel lower because you traded one kind of cognitive effort for a more draining one. The task-switching cost between "understanding intent" and "verifying implementation" is well-studied in cognitive science and it's not something you can optimize away with better technique.

nanobuilds 2026-03-16 02:15 UTC link

Your human context also needs compacting at some point. After hours of working with an LLM, your prompts tend to become less detailed, you tend to trust the LLM more, and it's easier to go down a solution that is not necessarily the best one. It becomes more of a brute forcing LLM assisted "solve this issue flow". What's funny is that it sometimes feels that the LLM itself is exhausted as well as the human and then the context compacting makes it even worse.

It's like with regular non-llm assisted coding. Sometimes you gotta sleep on it and make a new /plan with a fresh direction.

babas03 2026-03-16 02:28 UTC link

This is exactly what was needed. Seamlessly transitioning from manual inspection in the Elements/Network panels to agent-led investigation is going to save so much 'context-setting' time.

owentbrown 2026-03-16 02:36 UTC link

I really appreciate the author for writing this.

I learned years ago that I when I write code after 10 PM, I'm go backward instead of forward. It was easy to see, because the test just wouldn't pass, or I'd introduce several bugs that each took 30 minutes to fix.

I'm learning now that it's no different, working with agents.

codance 2026-03-16 03:26 UTC link

The shift from creation to verification is real, but I think the bigger issue is people treating LLM output as a black box to review. What works better: write specs and tests first, then let the LLM implement against them. You're no longer "reviewing code" — you're checking if tests pass. The cognitive load drops dramatically when verification is automated rather than manual.

hombre_fatal 2026-03-15 23:45 UTC link

I think the upper limit is your ability to decide what to build among infinite possibilities. How should it work, what should it be like to use it, what makes the most sense, etc.

The code part is trivial and a waste of time in some ways compared to time spent making decisions about what to build. And sometimes even a procrastination to avoid thinking about what to build, like how people who polish their game engine (easy) to avoid putting in the work to plan a fun game (hard).

The more clarity you have about what you’re building, then the larger blocks of work you can delegate / outsource.

So I think one overwhelming part of LLMs is that you don’t get the downtime of working on implementation since that’s now trivial; you are stuck doing the hard part of steering and planning. But that’s also a good thing.

SchemaLoad 2026-03-15 23:49 UTC link

That sounds exhausting having to non stop prompt and review without a second to stop and think.

bluebarbet 2026-03-16 00:10 UTC link

How is "llms" pronounced?

senectus1 2026-03-16 00:12 UTC link

I imagine code reviewing is a very different sort of skill than coding. When you vibe code (assuming you're reading teh code that is written for you) you become a coder reviewer... I suspect you're learning a new skill.

raincole 2026-03-16 00:21 UTC link

If you care at code quality of course it is exhausting. It's supposed to be. Now there is more code for you to assure quality in the same length of time.

anonzzzies 2026-03-16 00:24 UTC link

I always wonder where HNers worked or work; we do ERP and troubleshooting on legacy systems for medium to large corps; PRs by humans were always pretty random and barely looked at as well, even though the human wrote it (copy/pasted from SO and changed it somewhat); if you ask what it does they cannot tell you. This is not an exception, this is the norm as far as I can see outside HN. People who talk a lot, don't understand anything and write code that is almost alien. LLMs, for us, are a huge step up. There is a 40 nested if with a loop to prevent it from failing on a missing case in a critical Shell (the company) ERP system. LLMs would not do that. It is a nightmare but makes us a lot of money for keeping things like that running.

jatora 2026-03-16 00:32 UTC link

You need to learn to use the tool better, clearly, if you have such an unhinged take as this.

shiandow 2026-03-16 00:50 UTC link

The one thing I don't quite get is how running a loop alongside an agent is any different from reviewing those PRs.

bmurphy1976 2026-03-16 00:53 UTC link

I don't know what to think about comments like this. So many of them come from accounts that are days or at most weeks old. I don't know if this is astroturfing, or you really are just a new account and this is your experience.

As somebody who has been coding for just shy of 40 years and has gone through the actual pain on learning to run a high level and productive dev team, your experience does not match mine. Even great devs will forget some of the basics and make mistakes and I wish every junior (hell even seniors) were as effective as the LLMs are turning out to be. Put the LLM in the hands of a seasoned engineer who also has the skills to manage projects and mentor junior devs and you have a powerful accelerator. I'm seeing the outcome of that every day on my team. The velocity is up AND the quality is up.

skybrian 2026-03-16 01:28 UTC link

Yes, I briefly felt like I needed to keep agents busy but got over it. The point of having multiple things going on is so you have a another task to work on.

akomtu 2026-03-16 01:31 UTC link

You used to be a Formula 1 driver. Now you are an instructor for a Formula 1 autopilot. You have to watch it at all times with full attention for it's a fast and reckless driver.

bsjshshsb 2026-03-16 01:39 UTC link

Use AI to review.

bmurphy1976 2026-03-16 02:01 UTC link

> Now corporate has mandated AI usage and is asking people to do 10k LOC PRs every day.

That's a big red flag if I ever saw one. Corporate should be empowering the engineering team to use AI tooling to improve their own process organically. Is this true or exaggeration? If it's true I'd start looking for a more balanced position at more disciplined org.

kbmr 2026-03-16 02:16 UTC link

>Reviewing LLM output requires constant context-switching between "what does this code do" and "is this what I actually wanted."

Best way I've seen it framed

j3k3 2026-03-16 02:23 UTC link

Great post.

So the people who are claiming huge jumps in productivity in the workplace, how are they dealing with this 'review fatigue'?

flir 2026-03-16 02:36 UTC link

I've always preferred brownfield work. In the past I've said "it's easier to be an editor than an author" to describe why. I think you're on to something. For me the new structure's cognitively easier, but it's not faster. Might even be slightly slower.

anilgulecha 2026-03-16 03:22 UTC link

It's orthogonal IMO. YOLO or not is simply a sign of trust for the harness or not. Trust slightly affects cognition, but not much. My working hypothesis: exhaustion is the residue of use of cognition.

What impacts cognition for me, and IMO for a lot of folks, is how well we end up defining our outcomes. Agents are tremendous at working towards the outcome (hence by TDD red-green works wonderfully), but if you point them to a goal slightly off, then you'll have to do the work of getting them on track, demanding cognition.

So the better you're at your initial research/plan phase, where you document all of your direction and constraints, the lesser effort is needed in the review.

The other thing impacting cognition is how many parallel threads you're running. I have defaulted to major/minor system - at any time I have 1 major project (higher cognition) and 1 minor agent (lower cognition) going. It's where managing this is comfortable.

orbital-decay 2026-03-16 03:23 UTC link

LLM spam, ironically

Editorial Channel

What the content says

+0.25

Article 19 Freedom of Expression

Medium Practice

Editorial

+0.25

SETL

+0.11

Article expresses opinion and reflective commentary on LLM use without fear of interference. Author freely shares personal technical perspectives and problem-solving approaches.

+0.15

Article 23 Work & Equal Pay

Low Framing

Editorial

+0.15

SETL

Author reflects on work practices and feedback cycles in AI-assisted development. Discusses craft pride and the value of thoughtful problem-solving, implicitly affirming the dignity of meaningful work.

-0.20

Article 12 Privacy

Medium Practice

Editorial

-0.20

SETL

-0.10

Content does not explicitly discuss privacy, but the author reflects on personal fatigue and mental processes in technical work, which implicitly touches on the integrity of personal thought and rest.

Preamble Preamble

Content does not directly engage with UDHR preamble themes of dignity, freedom, or inherent rights.

Article 1 Freedom, Equality, Brotherhood

No discussion of equality or inherent dignity in the context of human beings.

Article 2 Non-Discrimination

Content does not address discrimination or entitlement without distinction.

Article 3 Life, Liberty, Security

No engagement with right to life, liberty, or personal security.

Article 4 No Slavery

Content does not address slavery or servitude.

Article 5 No Torture

No discussion of torture or cruel treatment.

Article 6 Legal Personhood

No engagement with right to recognition as a person before law.

Article 7 Equality Before Law

Content does not address equal protection before law.

Article 8 Right to Remedy

No discussion of remedy for rights violations.

Article 9 No Arbitrary Detention

Content does not address arbitrary arrest or detention.

Article 10 Fair Hearing

No engagement with fair and public hearing rights.

Article 11 Presumption of Innocence

Content does not address presumption of innocence or retroactive criminal law.

Article 13 Freedom of Movement

No discussion of freedom of movement or residence.

Article 14 Asylum

Content does not address asylum or refuge rights.

Article 15 Nationality

No engagement with nationality rights.

Article 16 Marriage & Family

Content does not address marriage or family rights.

Article 17 Property

No discussion of property ownership rights.

Article 18 Freedom of Thought

Content does not address freedom of thought, conscience, or religion.

Article 20 Assembly & Association

No discussion of freedom of assembly or association.

Article 21 Political Participation

Content does not address democratic participation or governance rights.

Article 22 Social Security

No engagement with social security or economic rights.

Article 24 Rest & Leisure

No discussion of rest, leisure, or work limits.

Article 25 Standard of Living

Medium Practice

Content does not directly address health, food, housing, or medical care rights.

Article 26 Education

Medium Practice

Content addresses technical skill development and problem-solving methodology, which touches obliquely on the value of education and self-improvement.

Article 27 Cultural Participation

Content does not address participation in cultural or scientific life.

Article 28 Social & International Order

No discussion of social and international order.

Article 29 Duties to Community

Content does not address duties to community or limitations on rights.

Article 30 No Destruction of Rights

No engagement with prevention of rights destruction or abuse.

Structural Channel

What the site does

Domain Context Profile

Element	Modifier	Affects	Note
Legal & Terms
Privacy	-0.05	Article 12	Site implements tracking via image beacon for analytics and upvote/engagement monitoring without explicit privacy notice visible in content. Tracking occurs on page load via fetch and pixel tracking.
Terms of Service	—		No Terms of Service document found on-domain or linked from visible content.
Identity & Mission
Mission	—		No explicit mission statement found. Site appears to be a personal technical blog focused on software engineering practices and AI development.
Editorial Code	—		No editorial standards document or code of conduct visible on-domain.
Ownership	—		Single author (Tom Johnell) identified; appears to be independent personal blog.
Access & Distribution
Access Model	+0.05	Article 19 Article 26	Content is freely accessible with no paywall. Email and RSS subscription options provided, expanding access methods.
Ad/Tracking	-0.05	Article 12	Analytics tracking via custom pixel endpoint (/hit/) collects referrer data and engagement metrics without explicit opt-in consent visible.
Accessibility	+0.10	Article 25 Article 26	Site includes semantic HTML (article schema.org markup), ARIA labels on interactive elements, supports dark mode preference detection, and responsive design for mobile access.
br_tracking	+0.05	Preamble ¶5 Article 12 Article 19	No third-party trackers detected
br_security	-0.05	Article 3 Article 12	Security headers: HTTPS
br_accessibility	-0.05	Article 26 Article 27 ¶1	No accessibility features detected
br_consent	0.00	Article 12 Article 19 Article 20 ¶2	No cookie consent banner detected

+0.20

Article 19 Freedom of Expression

Medium Practice

Structural

+0.20

Context Modifier

+0.05

SETL

+0.11

Site provides free, unrestricted access to published content with no paywall or registration requirement blocking readership. RSS and email subscription options expand access methods.

-0.15

Article 12 Privacy

Medium Practice

Structural

-0.15

Context Modifier

-0.10

SETL

-0.10

Site implements analytics pixel tracking (/hit/) and upvote engagement monitoring that collects referrer data without explicit opt-in consent or prominent privacy notice.

Preamble Preamble

Site structure does not implement features tied to universal rights recognition.

Article 1 Freedom, Equality, Brotherhood

No structural features addressing equal rights.

Article 2 Non-Discrimination

No structural signals regarding non-discrimination.

Article 3 Life, Liberty, Security

No relevant structural elements.

Article 4 No Slavery

No relevant structural signals.

Article 5 No Torture

No relevant structural elements.

Article 6 Legal Personhood

No relevant structural signals.

Article 7 Equality Before Law

No relevant structural elements.

Article 8 Right to Remedy

No relevant structural signals.

Article 9 No Arbitrary Detention

No relevant structural elements.

Article 10 Fair Hearing

No relevant structural signals.

Article 11 Presumption of Innocence

No relevant structural elements.

Article 13 Freedom of Movement

No relevant structural signals.

Article 14 Asylum

No relevant structural elements.

Article 15 Nationality

No relevant structural signals.

Article 16 Marriage & Family

No relevant structural elements.

Article 17 Property

No relevant structural signals.

Article 18 Freedom of Thought

No relevant structural elements.

Article 20 Assembly & Association

No relevant structural signals.

Article 21 Political Participation

No relevant structural elements.

Article 22 Social Security

No relevant structural signals.

Article 23 Work & Equal Pay

Low Framing

Article 24 Rest & Leisure

No relevant structural elements.

Article 25 Standard of Living

Medium Practice

Site implements responsive design, dark mode preference detection, semantic HTML, and ARIA labels that expand access for users with varying abilities and device preferences.

Article 26 Education

Medium Practice

Free access model with multiple subscription options (email, RSS) supports knowledge access. Semantic HTML and accessible design enhance usability for learners with diverse needs.

Article 27 Cultural Participation

No relevant structural signals.

Article 28 Social & International Order

No relevant structural elements.

Article 29 Duties to Community

No relevant structural signals.

Article 30 No Destruction of Rights

No relevant structural elements.

Supplementary Signals

How this content communicates, beyond directional lean. Learn more

Epistemic Quality ℹ

How well-sourced and evidence-based is this content?

0.66 medium claims

Sources		0.6
Evidence		0.7
Uncertainty		0.7
Purpose		0.8

Propaganda Flags ℹ

No manipulative rhetoric detected

0 techniques detected

Emotional Tone ℹ

Emotional character: positive/negative, intensity, authority

measured

Valence		+0.3
Arousal		0.4
Dominance		0.6

Transparency ℹ

Does the content identify its author and disclose interests?

0.50

✓ Author ✗ Conflicts

More signals: context, framing & audience

Solution Orientation ℹ

Does this content offer solutions or only describe problems?

0.70 solution oriented

Reader Agency

0.8

Stakeholder Voice ℹ

Whose perspectives are represented in this content?

0.50 2 perspectives

Speaks: individuals

About: workersengineerscommunity

Temporal Framing ℹ

Is this content looking backward, at the present, or forward?

present short term

Geographic Scope ℹ

What geographic area does this content cover?

global

Complexity ℹ

How accessible is this content to a general audience?

moderate medium jargon domain specific

Longitudinal 836 HN snapshots · 6 evals

Audit Trail 15 entries

2026-03-16 02:28	eval_success	PSQ evaluated: g-PSQ=0.280 (3 dims)	- -
2026-03-16 02:28	eval	Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive)
2026-03-16 02:26	eval_success	Lite evaluated: Neutral (-0.07)	- -
2026-03-16 02:26	eval	Evaluated by llama-4-scout-wai: -0.07 (Neutral)
	reasoning Personal blog post about challenges with LLMs, no explicit human rights discussion
2026-03-16 02:26	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-16 00:08	eval_success	Evaluated: Neutral (0.03)	- -
2026-03-16 00:08	eval	Evaluated by claude-haiku-4-5-20251001: +0.03 (Neutral) 14,961 tokens -0.08
2026-03-16 00:08	rater_validation_warn	Validation warnings for model claude-haiku-4-5-20251001: 0W 2R	- -
2026-03-16 00:05	eval_success	PSQ evaluated: g-PSQ=0.243 (3 dims)	- -
2026-03-16 00:05	eval	Evaluated by llama-3.3-70b-wai-psq: +0.24 (Mild positive)
2026-03-16 00:04	eval_success	Evaluated: Mild positive (0.12)	- -
2026-03-16 00:04	eval	Evaluated by claude-haiku-4-5-20251001: +0.12 (Mild positive) 18,136 tokens
2026-03-16 00:01	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-03-16 00:01	eval	Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
	reasoning Technical blog post on LLMs
2026-03-16 00:01	rater_validation_warn	Lite validation warnings for model llama-3.3-70b-wai: 1W 0R	- -

build ee2b489+gzrb · deployed 2026-03-10 22:52 UTC · evaluated 2026-03-16 02:03:38 UTC