How are people do AI evals these days?

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

Model: @cf/meta/llama-4-scout-17b-16e-instruct lite 0.00 @cf/meta/llama-4-scout-17b-16e-instruct lite ND @cf/meta/llama-3.3-70b-instruct-fp8-fast lite ND @cf/meta/llama-3.3-70b-instruct-fp8-fast lite 0.00 Compare

0.00	How are people do AI evals these days?
	30 points by yelmahallawy 3 days ago \| 40 comments on HN \| Mild negative ~lite vlite-1.6

Summary ~lite AI Technology Neutral

Discussion on AI model evaluations

EQ 0.00

SO 0.00

TD 0.00

Lite evaluation by llama-4-scout-wai · editorial channel only · no per-section breakdown available

Longitudinal 583 HN snapshots · 51 evals

Audit Trail 71 entries

2026-03-11 22:02	eval_success	Lite evaluated: Mild negative (-0.24)	- -
2026-03-11 22:02	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 22:02	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-11 21:54	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-11 21:54	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 20:47	eval_success	Lite evaluated: Mild negative (-0.24)	- -
2026-03-11 20:47	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 20:47	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-11 20:36	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-11 20:36	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 19:21	eval_success	Lite evaluated: Mild negative (-0.24)	- -
2026-03-11 19:21	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-11 19:21	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 19:12	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-11 19:12	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 18:09	eval_success	Lite evaluated: Mild negative (-0.24)	- -
2026-03-11 18:09	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 18:09	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-11 17:59	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-11 17:59	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 16:56	eval_success	Lite evaluated: Mild negative (-0.24)	- -
2026-03-11 16:56	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 16:56	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-11 16:46	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-11 16:46	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 15:43	eval_success	Lite evaluated: Mild negative (-0.24)	- -
2026-03-11 15:43	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 15:43	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-11 15:19	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-11 15:19	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 14:17	eval_success	Lite evaluated: Mild negative (-0.24)	- -
2026-03-11 14:17	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 14:17	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-11 14:05	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 13:09	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 12:50	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 12:29	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 12:09	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 11:52	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 11:32	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 11:17	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 10:55	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 10:44	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 10:17	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 10:07	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 09:37	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 09:27	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 09:00	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 08:50	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 08:24	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 08:13	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 07:47	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 07:38	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 07:11	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 07:03	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 06:34	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 06:28	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 05:57	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 05:53	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 05:22	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 05:17	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 04:42	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 04:37	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 03:27	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 03:21	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 02:06	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-11 02:06	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative) 0.00
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 01:06	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive)
2026-03-11 01:05	eval	Evaluated by llama-4-scout-wai: -0.24 (Mild negative)
	reasoning Technical discussion on AI evaluations, no human rights discussion
2026-03-11 00:12	eval	Evaluated by llama-3.3-70b-wai-psq: +0.30 (Moderate positive)
2026-03-11 00:09	eval	Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
	reasoning Neutral AI evaluation discussion

build ee2b489+gzrb · deployed 2026-03-10 22:52 UTC · evaluated 2026-03-08 02:36:46 UTC