Show HN: CivBench a long-horizon AI benchmark for multi-agent games (clashai.live)
12 points by mbh159 2 days ago | 25 comments on HN
Pending Evaluation
This story is queued for evaluation. It will be processed in an upcoming batch.
Queued: 2026-02-27 00:34:51
Audit Trail 6 entries
2026-02-28 05:40 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 05:40 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
2026-02-28 05:40 rater_validation_warn Light validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 05:22 rater_validation_warn Light validation warnings for model llama-3.3-70b-wai: 0W 1R - -
2026-02-28 05:22 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 05:22 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)