Show HN: CivBench a long-horizon AI benchmark for multi-agent games

●	Show HN: CivBench a long-horizon AI benchmark for multi-agent games (clashai.live)
	12 points by mbh159 2 days ago \| 25 comments on HN

Audit Trail 6 entries

2026-02-28 05:40	eval_success	Light evaluated: Neutral (0.00)	- -
2026-02-28 05:40	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral)
2026-02-28 05:40	rater_validation_warn	Light validation warnings for model llama-4-scout-wai: 0W 1R	- -
2026-02-28 05:22	rater_validation_warn	Light validation warnings for model llama-3.3-70b-wai: 0W 1R	- -
2026-02-28 05:22	eval_success	Light evaluated: Neutral (0.00)	- -
2026-02-28 05:22	eval	Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)

build 4488008+jgil · deployed 2026-02-28 07:14 UTC · evaluated 2026-02-28 07:31:13 UTC