AGENTS.md outperforms skills in our agent evals

Beta This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

ND	AGENTS.md outperforms skills in our agent evals (vercel.com)
	524 points by maximedupre 35 days ago \| 199 comments on HN ~lite vlite-2.0

Summary ~lite

Technical blog post discussing agent evals with no apparent threats or manipulative language.

Lite evaluation by llama-4-scout-wai-psq · editorial channel only · no per-section breakdown available

Longitudinal · 4 evals

Audit Trail 8 entries

2026-03-05 10:25	eval_success	PSQ evaluated: g-PSQ=-0.040 (3 dims)	- -
2026-03-05 10:25	eval	Evaluated by llama-4-scout-wai-psq: -0.04 (Neutral)
2026-03-05 10:19	eval_success	PSQ evaluated: g-PSQ=0.000 (3 dims)	- -
2026-03-05 10:19	eval	Evaluated by llama-3.3-70b-wai-psq: 0.00 (Neutral)
2026-03-01 00:48	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-03-01 00:48	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral)
	reasoning ED neutral tech blog post
2026-03-01 00:47	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-03-01 00:47	eval	Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
	reasoning Tech blog no rights stance

build bab9649+v7kr · deployed 2026-03-05 19:25 UTC · evaluated 2026-03-03 07:16:53 UTC