Do Not A/B Test My Workflow

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

Model: @cf/meta/llama-3.3-70b-instruct-fp8-fast lite ND @cf/meta/llama-3.3-70b-instruct-fp8-fast lite +0.60 @cf/meta/llama-4-scout-17b-16e-instruct lite +0.40 @cf/meta/llama-4-scout-17b-16e-instruct lite ND Compare

ND	Do Not A/B Test My Workflow (backnotprop.com)
	19 points by ramoz 6 days ago \| 2 comments on HN ~lite vlite-2.0

Summary ~lite

Critique of silent A/B testing

Lite evaluation by llama-3.3-70b-wai-psq · editorial channel only · no per-section breakdown available

Longitudinal 4 HN snapshots · 35 evals

Audit Trail 55 entries

2026-03-15 00:35	eval_success	PSQ evaluated: g-PSQ=-0.170 (3 dims)	- -
2026-03-15 00:35	eval	Evaluated by llama-3.3-70b-wai-psq: -0.17 (Mild negative)
2026-03-15 00:33	eval_success	Lite evaluated: Mild positive (0.28)	- -
2026-03-15 00:33	eval	Evaluated by llama-3.3-70b-wai: +0.28 (Mild positive)
	reasoning Critique of A/B testing in professional tools
2026-03-14 22:35	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-14 22:35	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 21:36	eval_success	PSQ evaluated: g-PSQ=-0.234 (3 dims)	- -
2026-03-14 21:36	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 21:21	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-14 21:21	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 20:18	eval_success	PSQ evaluated: g-PSQ=-0.234 (3 dims)	- -
2026-03-14 20:18	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 20:10	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-14 20:10	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 18:40	eval_success	PSQ evaluated: g-PSQ=-0.234 (3 dims)	- -
2026-03-14 18:40	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 18:21	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-14 18:21	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 17:04	eval_success	PSQ evaluated: g-PSQ=-0.234 (3 dims)	- -
2026-03-14 17:04	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 16:43	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-14 16:43	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 15:53	eval_success	PSQ evaluated: g-PSQ=-0.234 (3 dims)	- -
2026-03-14 15:53	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 15:34	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-14 15:34	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 15:08	eval_success	PSQ evaluated: g-PSQ=-0.234 (3 dims)	- -
2026-03-14 15:08	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 14:50	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-14 14:50	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 14:30	eval_success	PSQ evaluated: g-PSQ=-0.234 (3 dims)	- -
2026-03-14 14:30	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 14:14	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-14 14:14	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 13:52	eval_success	PSQ evaluated: g-PSQ=-0.234 (3 dims)	- -
2026-03-14 13:52	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 13:36	eval_success	Lite evaluated: Mild positive (0.16)	- -
2026-03-14 13:36	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 13:14	eval_success	PSQ evaluated: g-PSQ=-0.234 (3 dims)	- -
2026-03-14 13:14	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 13:00	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 12:37	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 12:24	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 11:59	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 11:50	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 11:21	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 11:15	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 10:45	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 10:39	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 10:06	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 10:02	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 09:25	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative) 0.00
2026-03-14 09:19	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive) 0.00
	reasoning Technical blog post criticizing AI tool's A/B testing practices
2026-03-14 08:42	eval	Evaluated by llama-4-scout-wai-psq: -0.23 (Mild negative)
2026-03-14 08:39	eval	Evaluated by llama-4-scout-wai: +0.16 (Mild positive)
	reasoning Technical blog post criticizing AI tool's A/B testing practices

build ee2b489+gzrb · deployed 2026-03-10 22:52 UTC · evaluated 2026-03-16 02:03:38 UTC