| |
Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology → |
| Pending Evaluation This story is queued for evaluation. It will be processed in an upcoming batch.
Queued: 2026-02-27 10:03:32 | |
Longitudinal
· 4 evals | |
Audit Trail
10 entries | 2026-03-05 18:29 | eval_success | PSQ evaluated: g-PSQ=-0.112 (3 dims) | - - | | 2026-03-05 18:29 |
eval
|
Evaluated by llama-4-scout-wai-psq: -0.11 (Mild negative) | | | 2026-03-05 18:19 | eval_success | PSQ evaluated: g-PSQ=0.474 (3 dims) | - - | | 2026-03-05 18:19 |
eval
|
Evaluated by llama-3.3-70b-wai-psq: +0.47 (Moderate positive) | | | 2026-02-28 12:18 | eval_success | Lite evaluated: Mild positive (0.20) | - - | | 2026-02-28 12:18 | rater_validation_warn | Lite validation warnings for model llama-4-scout-wai: 0W 1R | - - | | 2026-02-28 12:18 |
eval
|
Evaluated by llama-4-scout-wai: +0.20 (Mild positive) | | | reasoning Editorial stance on canceling user behavior tracking | | 2026-02-28 12:15 | eval_success | Lite evaluated: Mild positive (0.10) | - - | | 2026-02-28 12:15 |
eval
|
Evaluated by llama-3.3-70b-wai: +0.10 (Mild positive) | | | reasoning Product issue tracker | | 2026-02-28 12:15 | rater_validation_warn | Lite validation warnings for model llama-3.3-70b-wai: 0W 1R | - - | | |
| |