ND Golden Sets: Regression Engineering for Probabilistic Systems (heavythoughtcloud.com)
12 points by ryan-s 6 days ago | 6 comments on HN ~lite vlite-2.0
Summary ~lite
Promotes safe AI development
Lite evaluation by llama-3.3-70b-wai-psq · editorial channel only · no per-section breakdown available
Longitudinal 175 HN snapshots · 49 evals
+1 0 −1 HN
Audit Trail 69 entries
2026-03-14 00:29 eval_success PSQ evaluated: g-PSQ=0.201 (3 dims) - -
2026-03-14 00:29 eval Evaluated by llama-3.3-70b-wai-psq: +0.20 (Mild positive)
2026-03-14 00:28 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-14 00:28 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
reasoning
Technical content with no rights discussion
2026-03-14 00:28 rater_validation_warn Lite validation warnings for model llama-3.3-70b-wai: 1W 0R - -
2026-03-13 22:56 eval_success PSQ evaluated: g-PSQ=0.280 (3 dims) - -
2026-03-13 22:56 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 22:50 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-13 22:50 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 22:50 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-13 21:16 eval_success PSQ evaluated: g-PSQ=0.280 (3 dims) - -
2026-03-13 21:16 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 21:10 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-13 21:10 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 21:10 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-13 20:01 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-13 20:01 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 20:01 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-13 19:23 eval_success PSQ evaluated: g-PSQ=0.280 (3 dims) - -
2026-03-13 19:23 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 18:38 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-13 18:38 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 18:38 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-13 18:09 eval_success PSQ evaluated: g-PSQ=0.280 (3 dims) - -
2026-03-13 18:09 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 17:23 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-13 17:23 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 17:23 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-13 16:47 eval_success PSQ evaluated: g-PSQ=0.280 (3 dims) - -
2026-03-13 16:47 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 15:55 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-13 15:55 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 15:55 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-13 15:39 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 15:17 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 15:00 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 14:38 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 14:13 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 13:55 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 13:35 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 13:20 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 12:57 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 12:42 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 12:20 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 12:05 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 11:42 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) -0.04
2026-03-13 11:30 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 11:03 eval Evaluated by llama-4-scout-wai-psq: +0.32 (Moderate positive) +0.04
2026-03-13 10:52 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 10:25 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 10:13 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 09:46 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 09:35 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 09:07 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 08:58 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 08:30 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 08:19 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 07:47 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 07:39 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 07:06 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 07:00 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 06:27 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 06:22 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 05:50 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 05:46 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 05:14 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive) 0.00
2026-03-13 05:11 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical article on AI system regression testing, no explicit human rights discussion
2026-03-13 04:36 eval Evaluated by llama-4-scout-wai-psq: +0.28 (Mild positive)
2026-03-13 04:34 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
reasoning
Technical article on AI system regression testing, no explicit human rights discussion