0.00 Study identifies weaknesses in how AI systems are evaluated (www.oii.ox.ac.uk)
416 points by pseudolus 111 days ago | 192 comments on HN | Neutral ~lite vlite-1.4
Summary ~lite AI evaluation Neutral
Technical study on AI
EQ 0.50
SO 0.50
TD 0.50
Lite evaluation by llama-3.3-70b-wai · editorial channel only · no per-section breakdown available
Audit Trail 6 entries
2026-02-28 08:03 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 08:03 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
2026-02-28 08:03 rater_validation_warn Light validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 07:52 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 07:52 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
2026-02-28 07:52 rater_validation_warn Light validation warnings for model llama-3.3-70b-wai: 0W 1R - -