ClawSandbox – 7 of 9 attacks succeeded against an AI agent with shell access — Pending

Beta This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

ND	ClawSandbox – 7 of 9 attacks succeeded against an AI agent with shell access (github.com)
	2 points by ariansyah 1 days ago \| 3 comments on HN ~lite vlite-2.0

Summary ~lite

GitHub repository discusses a security experiment with an AI agent.

Lite evaluation by llama-4-scout-wai-psq · editorial channel only · no per-section breakdown available

Longitudinal 96 HN snapshots · 4 evals

Audit Trail 10 entries

2026-03-05 05:22	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-05 05:22	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive)
2026-03-05 05:13	eval_success	PSQ evaluated: g-PSQ=0.000 (3 dims)	- -
2026-03-05 05:13	eval	Evaluated by llama-3.3-70b-wai-psq: 0.00 (Neutral)
2026-03-04 13:59	eval_success	Lite evaluated: Moderate negative (-0.42)	- -
2026-03-04 13:59	eval	Evaluated by llama-4-scout-wai: -0.42 (Moderate negative)
	reasoning Technical GitHub page, no explicit human rights discussion
2026-03-04 13:59	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-04 13:59	eval_success	Lite evaluated: Moderate negative (-0.42)	- -
2026-03-04 13:59	eval	Evaluated by llama-3.3-70b-wai: -0.42 (Moderate negative)
	reasoning GitHub repository with no explicit rights discussion
2026-03-04 13:59	rater_validation_warn	Lite validation warnings for model llama-3.3-70b-wai: 1W 0R	- -

build 78aa6a0+evs7 · deployed 2026-03-05 18:44 UTC · evaluated 2026-03-03 07:16:53 UTC