Claude Code's binary reveals silent A/B tests on core features

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

●	Claude Code's binary reveals silent A/B tests on core features (backnotprop.com)
	168 points by ramoz 6 days ago \| 211 comments on HN

Pending Evaluation

This story is queued for evaluation. It will be processed in an upcoming batch.

Queued: 2026-03-14 11:48:51

Longitudinal 203 HN snapshots · 93 evals

Audit Trail 113 entries

2026-03-17 01:08	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-17 01:07	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-17 01:06	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-17 01:06	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 23:32	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-16 23:32	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 23:31	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-16 23:31	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 22:11	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-16 22:11	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 22:08	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-16 22:08	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 20:53	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-16 20:53	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 20:37	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-16 20:37	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 19:01	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-16 19:01	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 18:43	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-16 18:43	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 17:52	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-16 17:52	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 17:19	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-16 17:19	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 16:40	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-16 16:40	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 16:29	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-16 16:29	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 16:04	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-16 16:04	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 15:51	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-16 15:51	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 15:28	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-16 15:28	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 15:16	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-16 15:16	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 14:51	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-16 14:51	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 14:42	eval_success	PSQ evaluated: g-PSQ=0.006 (3 dims)	- -
2026-03-16 14:42	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 14:17	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 14:05	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 13:40	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 13:27	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 13:04	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 12:52	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 12:29	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 12:15	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 11:54	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 11:40	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 11:19	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 11:04	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 10:41	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 10:27	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 10:02	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 09:49	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 09:23	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 09:10	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 08:44	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 08:30	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 08:08	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 07:53	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 07:32	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 07:17	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 06:57	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 06:42	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 06:19	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 06:07	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 05:45	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 05:32	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 05:10	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 04:52	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 04:03	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 02:01	eval	Evaluated by claude-haiku-4-5-20251001: +0.25 (Mild positive) 12,411 tokens -0.02
2026-03-16 01:51	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-16 01:27	eval	Evaluated by claude-haiku-4-5-20251001: +0.27 (Mild positive) 11,720 tokens -0.11
2026-03-16 00:56	eval	Evaluated by claude-haiku-4-5-20251001: +0.38 (Moderate positive) 11,832 tokens +0.16
2026-03-16 00:54	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-16 00:27	eval	Evaluated by claude-haiku-4-5-20251001: +0.21 (Mild positive) 11,828 tokens -0.08
2026-03-15 23:50	eval	Evaluated by claude-haiku-4-5-20251001: +0.30 (Moderate positive) 11,657 tokens -0.06
2026-03-15 23:13	eval	Evaluated by claude-haiku-4-5-20251001: +0.36 (Moderate positive) 12,390 tokens
2026-03-15 22:55	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-15 22:08	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-15 17:58	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) 0.00
2026-03-15 17:43	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-15 16:47	eval	Evaluated by llama-4-scout-wai-psq: +0.01 (Neutral) +0.24
2026-03-15 16:29	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-15 00:35	eval	Evaluated by llama-3.3-70b-wai-psq: -0.17 (Mild negative)
2026-03-15 00:33	eval	Evaluated by llama-3.3-70b-wai: +0.16 (Mild positive)
	reasoning Critique of silent A/B testing on professional tool
2026-03-14 22:43	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 22:17	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 21:32	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 21:06	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 20:07	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 19:53	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 19:24	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 19:13	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 18:21	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 18:10	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 16:44	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 16:35	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 15:35	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 15:25	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 14:52	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 14:47	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 14:16	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 14:12	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 13:40	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 13:35	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 13:01	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 13:00	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Editorial stance on AI transparency, A/B testing, and user rights
2026-03-14 12:25	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative)
2026-03-14 12:24	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive)
	reasoning Editorial stance on AI transparency, A/B testing, and user rights

build ee2b489+gzrb · deployed 2026-03-10 22:52 UTC · evaluated 2026-03-16 02:03:38 UTC