0.00 How are people do AI evals these days?
30 points by yelmahallawy 3 days ago | 40 comments on HN | Mild negative ~lite vlite-1.6
Summary ~lite AI Technology Neutral
Discussion on AI model evaluations
EQ 0.00
SO 0.00
TD 0.00
Lite evaluation by llama-4-scout-wai · editorial channel only · no per-section breakdown available
Longitudinal 583 HN snapshots · 51 evals
+1 0 −1 HN
Audit Trail 71 entries
2026-03-11 22:02 eval_success Lite evaluated: Mild negative (-0.24) - -
2026-03-11 22:02 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 22:02 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-11 21:54 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-11 21:54 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 20:47 eval_success Lite evaluated: Mild negative (-0.24) - -
2026-03-11 20:47 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 20:47 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-11 20:36 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-11 20:36 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 19:21 eval_success Lite evaluated: Mild negative (-0.24) - -
2026-03-11 19:21 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-11 19:21 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 19:12 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-11 19:12 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 18:09 eval_success Lite evaluated: Mild negative (-0.24) - -
2026-03-11 18:09 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 18:09 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-11 17:59 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-11 17:59 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 16:56 eval_success Lite evaluated: Mild negative (-0.24) - -
2026-03-11 16:56 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 16:56 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-11 16:46 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-11 16:46 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 15:43 eval_success Lite evaluated: Mild negative (-0.24) - -
2026-03-11 15:43 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 15:43 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-11 15:19 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-11 15:19 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 14:17 eval_success Lite evaluated: Mild negative (-0.24) - -
2026-03-11 14:17 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 14:17 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-11 14:05 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 13:09 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 12:50 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 12:29 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 12:09 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 11:52 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 11:32 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 11:17 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 10:55 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 10:44 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 10:17 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 10:07 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 09:37 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 09:27 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 09:00 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 08:50 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 08:24 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 08:13 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 07:47 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 07:38 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 07:11 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 07:03 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 06:34 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 06:28 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 05:57 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 05:53 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 05:22 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 05:17 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 04:42 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 04:37 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 03:27 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 03:21 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 02:06 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 02:06 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 01:06 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive)
2026-03-11 01:05 eval Evaluated by llama-4-scout-wai: -0.24 (Mild negative)
reasoning
Technical discussion on AI evaluations, no human rights discussion
2026-03-11 00:12 eval Evaluated by llama-3.3-70b-wai-psq: +0.30 (Moderate positive)
2026-03-11 00:09 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
reasoning
Neutral AI evaluation discussion