ND EsoLang-Bench: Evaluating Genuine Reasoning in LLMs via Esoteric Languages (esolang-bench.vercel.app)
98 points by matt_d 6 days ago | 58 comments on HN ~lite vlite-2.0
Summary ~lite
Evaluating LLMs via esoteric programming languages reveals limitations in genuine programming reasoning.
Lite evaluation by llama-4-scout-wai-psq · editorial channel only · no per-section breakdown available
Longitudinal 416 HN snapshots · 83 evals
+1 0 −1 HN
Audit Trail 103 entries
2026-03-22 01:04 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-22 01:04 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-22 01:01 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-22 01:01 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-22 01:01 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-21 23:46 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-21 23:46 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-21 23:32 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-21 23:32 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-21 23:32 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-21 22:31 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-21 22:31 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-21 22:11 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-21 22:11 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-21 22:11 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-21 20:43 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-21 20:43 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-21 20:27 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-21 20:27 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-21 20:27 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 23:08 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-20 23:08 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 23:08 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-20 23:04 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-20 23:04 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 21:56 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-20 21:56 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 21:56 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-20 21:47 eval_success PSQ evaluated: g-PSQ=0.600 (3 dims) - -
2026-03-20 21:47 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 21:22 eval_success Lite evaluated: Neutral (0.00) - -
2026-03-20 21:22 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 21:21 rater_validation_warn Lite validation warnings for model llama-4-scout-wai: 1W 0R - -
2026-03-20 21:07 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 20:46 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 20:32 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 20:11 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 19:56 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 19:33 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 19:18 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) +0.16
2026-03-20 18:23 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 18:08 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) -0.16
2026-03-20 17:35 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 17:16 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 16:19 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 16:00 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) +0.16
2026-03-20 15:41 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 15:21 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) -0.16
2026-03-20 15:01 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 14:39 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 14:25 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 14:03 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 13:50 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 13:27 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 13:14 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 12:52 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) +0.16
2026-03-20 12:34 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 12:12 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) -0.16
2026-03-20 11:55 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 11:32 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 11:16 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 10:52 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 10:38 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 10:16 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 09:58 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 09:36 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 09:21 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 08:57 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 08:42 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 08:17 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 08:04 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 07:39 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 07:21 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 06:59 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) +0.16
2026-03-20 06:41 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 06:17 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) -0.16
2026-03-20 06:04 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 05:42 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-20 05:29 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 05:06 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) +0.16
2026-03-20 04:54 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 04:32 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) -0.16
2026-03-20 04:19 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 03:57 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) +0.16
2026-03-20 03:44 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 03:21 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) 0.00
2026-03-20 03:09 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 02:40 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) 0.00
2026-03-20 02:27 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 02:05 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) 0.00
2026-03-20 01:56 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 01:40 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) 0.00
2026-03-20 01:37 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 00:57 eval Evaluated by llama-4-scout-wai-psq: +0.44 (Moderate positive) -0.16
2026-03-20 00:54 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-20 00:14 eval Evaluated by llama-3.3-70b-wai-psq: +0.32 (Moderate positive)
2026-03-20 00:02 eval Evaluated by llama-3.3-70b-wai: -0.08 (Neutral)
reasoning
Technical content, zero rights discussion
2026-03-19 23:46 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-19 23:41 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-19 23:06 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-19 23:04 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.
2026-03-19 22:17 eval Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive)
2026-03-19 22:15 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
reasoning
Technical content evaluating LLMs via esoteric programming languages, no explicit human rights discussion.