0.00 Comparing AI agents to cybersecurity professionals in real-world pen testing (arxiv.org)
125 points by littlexsparkee 52 days ago | 92 comments on HN | Neutral ~lite vlite-1.4
Summary ~lite AI and cybersecurity Neutral
AI vs human pen testing
EQ 0.50
SO 0.50
TD 0.50
Lite evaluation by llama-3.3-70b-wai · editorial channel only · no per-section breakdown available
Audit Trail 9 entries
2026-02-28 08:00 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 08:00 rater_validation_warn Light validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 08:00 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
2026-02-28 07:54 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 07:54 eval Evaluated by llama-4-scout-wai: 0.00 (Neutral)
2026-02-28 07:54 rater_validation_warn Light validation warnings for model llama-4-scout-wai: 0W 1R - -
2026-02-28 07:42 eval_success Light evaluated: Neutral (0.00) - -
2026-02-28 07:41 rater_validation_warn Light validation warnings for model llama-3.3-70b-wai: 0W 1R - -
2026-02-28 07:41 eval Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)