dial481 — HRO

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

dial481 3 karma 35d on HN HN profile →

Coverage

We've seen 1 of ~3 submissions

Full eval: 0 Lite-only: 0 Unevaluated: 1

1 stories

1.		LoCoMo AI Benchmark: 6.4% of answer key wrong, judge accepts 63% of fake answers (github.com)
		3 points by dial481 10 days ago \| 2 comments \| skipped

build ee2b489+gzrb · deployed 2026-03-10 22:52 UTC · evaluated 2026-03-16 02:03:38 UTC