+0.09 Show HN: Axe A 12MB binary that replaces your AI framework

Name: HRCB Evaluation: Show HN: Axe A 12MB binary that replaces your AI framework
Item: Show HN: Axe A 12MB binary that replaces your AI framework
Rating: 0.081
Author: Human Rights Observatory

Alpha This system is experimental. Scores and classifications are early-stage research and may be unreliable. Methodology →

Model: @cf/meta/llama-4-scout-17b-16e-instruct lite ND @cf/meta/llama-4-scout-17b-16e-instruct lite 0.00 claude-haiku-4-5-20251001 +0.09 @cf/meta/llama-3.3-70b-instruct-fp8-fast lite ND @cf/meta/llama-3.3-70b-instruct-fp8-fast lite 0.00 Compare

+0.09	Show HN: Axe A 12MB binary that replaces your AI framework (github.com S:+0.05 )
	227 points by jrswab 5 days ago \| 128 comments on HN \| Neutral High agreement (3 models) Product · v3.7 · 2026-03-15 22:57:47 0

Summary Open Source & Digital Access Advocates

This GitHub repository for the 'axe' CLI tool demonstrates structural support for digital rights through open-source distribution, public accessibility, and technical privacy protections. The project advocates for freedom of expression and information access through its open-source licensing model and enables educational participation in software development. GitHub's underlying infrastructure provides modest but consistent privacy and security protections (HTTPS, no third-party tracking), supporting foundational human dignity principles.

Article Heatmap

Negative Neutral Positive No Data

Aggregates

+0.09

+0.05

Weighted Mean	+0.08	Unweighted Mean	+0.08
Max	+0.15 Article 19	Min	+0.05 Article 20
Signal	4	No Data	27
Volatility	0.04 (Low)
Negative	0	Channels	E: 0.6 S: 0.4
SETL ℹ	+0.04	Editorial-dominant
FW Ratio ℹ	58%	31 facts · 22 inferences
Agreement	High	3 models · spread ±0.041

Evidence 13% coverage ℹ

  7M  4L  27 ND 

Theme Radar

HN Discussion 20 top-level · 17 replies

armcat 2026-03-12 14:29 UTC link

Great work! Kind of reminds me of ell (https://github.com/MadcowD/ell), which had this concept of treating prompts as small individual programs and you can pipe them together. Not sure if that particular tool is being maintained anymore, but your Axe tool caters to that audience of small short-lived composable AI agents.

punkpeye 2026-03-12 14:43 UTC link

What are some things you've automated using Axe?

bensyverson 2026-03-12 14:44 UTC link

It's exciting to see so much experimentation when it comes to form factors for agent orchestration!

The first question that comes to mind is: how do you think about cost control? Putting a ton in a giant context window is expensive, but unintentionally fanning out 10 agents with a slightly smaller context window is even more expensive. The answer might be "well, don't do that," and that certainly maps to the UNIX analogy, where you're given powerful and possibly destructive tools, and it's up to you to construct the workflow carefully. But I'm curious how you would approach budget when using Axe.

Orchestrion 2026-03-12 15:29 UTC link

The Unix-style framing resonates a lot.

One thing I’ve noticed when experimenting with agent pipelines is that the “single-purpose agent” model tends to make both cost control and reasoning easier. Each agent only gets the context it actually needs, which keeps prompts small and behavior easier to predict.

Where it gets interesting is when the pipeline starts producing artifacts instead of just text — reports, logs, generated files, etc. At that point the workflow starts looking less like a chat session and more like a series of composable steps producing intermediate outputs.

That’s where the Unix analogy feels particularly strong: small tools, small contexts, and explicit data flowing between steps.

Curious if you’ve experimented with workflows where agents produce artifacts (files, reports, etc.) rather than just returning text.

btbuildem 2026-03-12 15:52 UTC link

I really like seeing the movement away from MCP across the various projects. Here the composition of the new with the old (the ol' unix composability) seems to um very nicely.

OP, what have you used this on in practice, with success?

hamandcheese 2026-03-12 15:58 UTC link

> Each agent is a TOML config with a focused job. Such as code reviewer, log analyzer, commit message writer. You can run them from the CLI, pipe data in, get results out.

I'm a bit skeptical of this approach, at least for building general purpose coding agents. If the agents were humans, it would be absolutely insane to assign such fine-grained responsibilities to multiple people and ask them to collaborate.

swaminarayan 2026-03-12 16:02 UTC link

Axe treats LLM agents like Unix programs—small, composable, version-controllable. Are we finally doing AI the Unix way?

reacharavindh 2026-03-12 16:11 UTC link

Reminded me of this from my bookmarks.

https://github.com/chr15m/runprompt

boznz 2026-03-12 19:09 UTC link

I will give it a try, I like the idea of being closer to the metal.

A Proper self-contained, self improving AI@home with the AI as the OS is my end goal, I have a nice high spec but older laptop I am currently using as a sacrificial pawn experimenting with this, but there is a big gap in my knowledge and I'm still working through GPT2 level stuff, also resources are tight when you're retired. I guess someone will get there this year the way things are going, but I'm happy to have fun until then.

mccoyb 2026-03-12 19:41 UTC link

Cool work!

Aside but 12 MB is ... large ... for such a thing. For reference, an entire HTTP (including crypto, TLS) stack with LLM API calls in Zig would net you a binary ~400 KB on ReleaseSmall (statically linked).

You can implement an entire language, compiler, and a VM in another 500 KB (or less!)

I don't think 12 MB is an impressive badge here?

Multicomp 2026-03-12 20:00 UTC link

This is what I've been trying to get nanobot to do, so thanks for sharing this. I plan to use this for workflow definitions like filesystems.

I have a known workflow to create an RPG character with steps, lets automate some of the boilerplate by having a succession of LLMs read my preferences about each step and apply their particular pieces of data to that step of the workflow, outputting their result to successive subdirectories, so I can pub/sub the entire process and make edits to intermediate files to tweak results as I desire.

Now that's cool!

CraigJPerry 2026-03-12 23:05 UTC link

I've had good success with something along these lines but perhaps a bit more raw:

    - claude takes a -p option
    - i have a bunch of tiny scripts, each script is an agent but it only does one tiny task
    - scripts can be composed in a unix pipeline

For example:

    $ git diff --staged | ai-commit-msg | git commit -F -

Where ai-commit-msg is a tiny agent:

    #!/usr/bin/env bash
    # ai-commit-msg: stdin=git diff, stdout=conventional commit message
    # Usage: git diff --staged | ai-commit-msg
    set -euo pipefail
    source "${AGENTS_DIR:-$HOME/.agents}/lib/agent-lib.sh"
    
    SYSTEM=$(load_skills \
        core/unix-output.md \
        core/be-concise.md \
        domain/git.md \
        output/plain-text.md)
    
    SYSTEM+=$'\n\nTask: Given a git diff on stdin, output a single conventional commit message. One line only.'
    
    run_agent "$SYSTEM"

And you can see to keep the agents themselves tiny, they rely on a little lib to load the various skills and optionally apply some guard / post-exec validator. Those validators are usually simple grep or whatever to make sure there were no writes outside a given dir but sometimes they can be to enforce output correctness (always jq in my examples so far...). In theory the guard could be another claude -p call if i needed a semantic instruction.

uhx 2026-03-13 02:14 UTC link

> - Path-sandboxed file ops. Keeps agents locked to a working directory

How is it supposed to work, if agent can simply run "cat" command instead of using skill for file read/write/etc?

multidude 2026-03-13 08:04 UTC link

A problem i have is that the agent's mental model of the system im building diverges from reality over time. After discussing that many times and asking it to remember, it becomes frustrating. In the README you say the agents memory persists across runs, would that solve said problem?

Also, I had to do several refactorings of my agent's constructs and found out that one of them was reinventing stuff producing a plethora of function duplications: e.g. DB connection pools(i had at least four of them simultaneously).

Would AXE require shared state between chained agents? Could it do it if required?

athrowaway3z 2026-03-13 08:15 UTC link

I'm not sure if HN is being flooded with bots or if the majority of people here nowadays lack a sense of simplicity.

Anybody looking to do interesting things should instantly ignore any project that mention "persistent memory". It speaks of scope creep or complexity obfuscation.

If a tool wants to include "persistent memory" it needs to write the 3 sentence explanation of how their scratch/notes files are piped around and what it achieves.

Not just claim "persistent memory".

I might even go so far that any project using the terminology "memory" is itself doomed to spend too much time & tokens building scaffolding for abstractions that dont work.

aa-jv 2026-03-13 08:42 UTC link

This is exactly what I have wanted for a while, so thank you very much!

Disclaimer: I haven't dug into axe enough yet, just going on first impressions.

>No daemon, no GUI.

I love the world we developers live in right now. ;)

>What would you automate first?

In a sense, I have wanted to be able to just add AI to a repo, and treat it like the junior developer it is. Its okay if the junior developer will do literally any stupid thing I tell it to do, because I won't tell it to do stupid things.

So, exactly: refactor this code, implement a shim, produce docs for <blah>, construct a build harness, write unit tests, produce a build, diff these codebases, implement this API, do all this on your own branch, and build and test things so that I can review the PR over coffee.

Essentially, three word commands which will encourage the AI to produce better software. Through my repo, so I can just review through the repo.

Okay, that's how I hope things work, now off to actually dig in to axe and give it a try on a few things, thanks very much again ..

ColonelPhantom 2026-03-13 09:26 UTC link

I like the idea of LLM-calling as an automation-friendly CLI tool! However, putting all my agents in ~/.config feels antithetical to this. My Bash scripts do not live there either, but rather in a separate script collection, or preferably, at their place of use (e.g. in a repo).

For example, let's say I want to add commit message generation (which I don't think is a great use of LLMs, but it is a practical example) to a repo. I would add the appropriate hook to /.git, but I would also want the agent with its instructions to live inside the repo (perhaps in an `axe` or `agents` directory).

Can Axe load agents from the current folder? Or can that be added?

bsoles 2026-03-13 10:51 UTC link

I don't know exactly how these things work, but you may run into copyright/TM issues with Deque's Axe tool: https://www.deque.com/axe/devtools/

kwstx 2026-03-14 14:03 UTC link

Really cool approach, I like the “Unix philosophy” for agents. Curious how you handle state persistence and chaining sub-agents when agents are depth-limited. Also, do you have any strategies for ensuring data consistency across runs, especially when multiple agents interact with the same files?

spranab 2026-03-16 00:27 UTC link

Ran jrswab/axe through IdeaCred (automated repo scorer) — 85/100, strongest in undefined.

Free badge/profile: https://ideacred.com/profile/jrswab

jrswab 2026-03-12 15:03 UTC link

> how you would approach budget when using Axe

Great question and it's something that I've not dig into yet. But I see no problem adding a way to limit LLMs by tokens or something similar to keep the cost for the user within reason.

jrswab 2026-03-12 15:06 UTC link

Thanks for checking it out! And yes the tool is indeed catering to that crowed. It's a need I have and thought others could use it as well.

jrswab 2026-03-12 15:26 UTC link

I have a few flows I'm using it for and have a growing list of things I want to automate. Basically, if there is a process that takes a human to do (like creating drafts or running scripts with variable data) I make axe do it.

1. I have a flow where I pass in a youtube video and the first agent calls an api to get the transcript, the second converts that transcript into a blog-like post, and the third uploads that blog-like post to instapaper.

2. Blog post drafting: I talk into my phone's notes app which gets synced via syncthing. The first agent takes that text and looks for notes in my note system for related information, than passes my raw text and notes into the next to draft a blog post, a third agent takes out all the em dashes because I'm tired of taking them out. Once that's all done then I read and edit it to be exactly what I want.

jrswab 2026-03-12 15:48 UTC link

> Curious if you’ve experimented with workflows where agents produce artifacts (files, reports, etc.) rather than just returning text.

Yes! I run a ghost blog (a blog that does not use my name) and have axe produce artifacts. The flow is: I send the first agent a text file of my brain dump (normally spoken) which it then searched my note system for related notes, saves it to a file, then passes everything to agent 2 which make that dump a blog draft and saves it to a file, agent 3 then takes that blog draft and cleans it up to how I like it and saves it. from that point I have to take it to publish after reading and making edits myself.

hiccuphippo 2026-03-12 16:05 UTC link

Clankers are not humans.

jrswab 2026-03-12 16:08 UTC link

That's my dream.

jrswab 2026-03-12 16:09 UTC link

I've shared a few flows I use a lot right now in some other comments.

Zondartul 2026-03-12 18:35 UTC link

It is easier to trust in the correctness and reliability of an LLM when you treat it as a glorified NLP function with a very narrow scope and limited responsibilities. That is to say, LLMs rarely mess up specific low level instructions, compared to open-ended, long-horizon tasks.

ipython 2026-03-12 19:48 UTC link

it's written in golang. 12MB barely gets you "hello world" since everything is statically linked. With that in mind, the size is impressive.

nine_k 2026-03-12 21:13 UTC link

12 MB is not large; it's like 3 minutes of watching YouTube. Actual RAM consumption is only very weakly correlated to the binary size, and that's what matters.

jrswab 2026-03-12 21:32 UTC link

Love to hear it! Thanks for checking it out and feel free to put up an issue on GitHub if you have any ideas for improvements.

jrswab 2026-03-12 21:34 UTC link

I'm excited to see how this plays out. Keep me updated on x(twitter)

avoutic 2026-03-13 02:15 UTC link

I was looking at something similar. What does your agent-lib.sh look like?

avoutic 2026-03-13 02:16 UTC link

Where is the nanobot approach not working for you?

linkregister 2026-03-13 03:55 UTC link

chroot

lionkor 2026-03-13 07:33 UTC link

Do you have examples of these commit messages? I have yet to see an AI write a good commit message. At least when compared to good commit messages -- if it just does better than "wip" or "fix stuff" that's not a high bar.

aa-jv 2026-03-13 08:44 UTC link

>scaffolding

The purpose of scaffolding is to create persistent memories.

>claim "persistent memory"

Just look at it as a build product.

>abstractions that don't work

Look at this as a testing problem.

Editorial Channel

What the content says

+0.10

Article 19 Freedom of Expression

Medium Advocacy Practice

Editorial

+0.10

SETL

0.00

Repository project itself (axe CLI tool) enables command-line tool usage supporting freedom of information access and technical expression. Project description available in plain text.

+0.10

Article 26 Education

Medium Advocacy

Editorial

+0.10

SETL

+0.10

Project itself serves educational purpose enabling developers to learn AI agent CLI development. README and code comments provide learning resources.

+0.10

Article 27 Cultural Participation

Medium Advocacy Practice

Editorial

+0.10

SETL

+0.07

Project enables technical culture participation and software development expression supporting arts and science engagement.

+0.05

Article 20 Assembly & Association

Medium Advocacy

Editorial

+0.05

SETL

0.00

Repository supports freedom of peaceful assembly through open collaboration model.

Preamble Preamble

Medium

No editorial content addressing human dignity or rights principles present on repository landing page.

Article 1 Freedom, Equality, Brotherhood

Low

No editorial commentary on equality or dignity present.

Article 2 Non-Discrimination

Low

No content addressing discrimination present.

Article 3 Life, Liberty, Security

Medium Practice

No editorial engagement with right to life and security.

Article 4 No Slavery

No content on slavery or servitude.

Article 5 No Torture

No content on torture or degradation.

Article 6 Legal Personhood

No content on legal personhood.

Article 7 Equality Before Law

No content on legal equality or discrimination before law.

Article 8 Right to Remedy

No content on legal remedies.

Article 9 No Arbitrary Detention

No content on arbitrary detention.

Article 10 Fair Hearing

No content on fair trial.

Article 11 Presumption of Innocence

No content on criminal prosecution or retroactive laws.

Article 12 Privacy

Medium Practice

No editorial content on privacy protection.

Article 13 Freedom of Movement

Low

No content on freedom of movement.

Article 14 Asylum

No content on asylum or refuge.

Article 15 Nationality

No content on nationality.

Article 16 Marriage & Family

No content on marriage or family.

Article 17 Property

No content on property rights.

Article 18 Freedom of Thought

No content on freedom of thought, conscience, or religion.

Article 21 Political Participation

No content on political participation.

Article 22 Social Security

No content on social security or welfare.

Article 23 Work & Equal Pay

Low

No content on labor rights or employment.

Article 24 Rest & Leisure

No content on rest and leisure.

Article 25 Standard of Living

No content on health, food, or housing.

Article 28 Social & International Order

No content on social and international order.

Article 29 Duties to Community

No content on duties or limitations.

Article 30 No Destruction of Rights

No content on prevention of rights destruction.

Structural Channel

What the site does

Domain Context Profile

Element	Modifier	Affects	Note
br_tracking	+0.05	Preamble ¶5 Article 12 Article 19	No third-party trackers detected
br_security	+0.05	Article 3 Article 12	Security headers: HTTPS, HSTS, CSP
br_accessibility	0.00	Article 26 Article 27 ¶1	Accessibility: lang attr, 100% alt text
br_consent	0.00	Article 12 Article 19 Article 20 ¶2	No cookie consent banner detected

+0.10

Article 19 Freedom of Expression

Medium Advocacy Practice

Structural

+0.10

Context Modifier

+0.05

SETL

0.00

GitHub platform provides public repository enabling free expression of code and ideas. No censorship mechanisms observed. No tracking of user views or restrictions on information access.

+0.05

Article 20 Assembly & Association

Medium Advocacy

Structural

+0.05

Context Modifier

0.00

SETL

0.00

GitHub infrastructure enables collaborative community participation through issues, pull requests, and discussions features supporting peaceful assembly of developers.

+0.05

Article 27 Cultural Participation

Medium Advocacy Practice

Structural

+0.05

Context Modifier

0.00

SETL

+0.07

GitHub repository provides platform for scientific and technical community participation. Open-source licensing enables cultural and scientific contribution without barriers.

0.00

Article 26 Education

Medium Advocacy

Structural

0.00

Context Modifier

0.00

SETL

+0.10

GitHub's accessibility features (language attribute, alt text) support educational access for diverse learners. Public repository enables universal educational access.

Preamble Preamble

Medium

GitHub infrastructure implements HTTPS, HSTS, and CSP security headers protecting user data integrity and privacy. No third-party tracking detected.

Article 1 Freedom, Equality, Brotherhood

Low

Repository uses open-source model and GitHub's platform structure with no observed discriminatory access controls on the public repository page.

Article 2 Non-Discrimination

Low

Repository structure does not exhibit observable discriminatory practices in access or presentation.

Article 3 Life, Liberty, Security

Medium Practice

GitHub's security infrastructure (HTTPS, HSTS, CSP) provides technical protections against data breach and system compromise that could threaten user security.

Article 4 No Slavery

Repository metadata and structure contain no observable signals regarding labor servitude practices.

Article 5 No Torture

No observable practices or structural elements engaging with torture or degrading treatment.

Article 6 Legal Personhood

Repository structure does not engage with legal personhood recognition.

Article 7 Equality Before Law

Repository does not implement discriminatory enforcement mechanisms observable on public page.

Article 8 Right to Remedy

Repository structure does not engage with dispute resolution or remedies mechanisms.

Article 9 No Arbitrary Detention

Repository structure does not engage with detention mechanisms.

Article 10 Fair Hearing

Repository structure does not implement judicial or trial mechanisms.

Article 11 Presumption of Innocence

Repository structure does not engage with criminal law application.

Article 12 Privacy

Medium Practice

GitHub infrastructure protects user privacy through HTTPS encryption, absence of third-party tracking, and no cookie consent banner requirement (indicating minimal invasive data collection).

Article 13 Freedom of Movement

Low

Repository accessible globally without geolocation restrictions observed.

Article 14 Asylum

Repository structure does not engage with asylum provisions.

Article 15 Nationality

Repository does not implement nationality-based restrictions in observable access structure.

Article 16 Marriage & Family

Repository structure does not engage with family or marriage matters.

Article 17 Property

Repository structure does not directly address property protections, though GitHub terms govern intellectual property licensing.

Article 18 Freedom of Thought

Repository structure does not restrict ideological or religious expression in project content.

Article 21 Political Participation

Repository structure does not implement democratic voting or political decision-making mechanisms.

Article 22 Social Security

Repository structure does not address social security provisions.

Article 23 Work & Equal Pay

Low

Repository does not specify labor practices or employment terms. GitHub platform neutral on labor rights for contributors.

Article 24 Rest & Leisure

Repository structure does not address rest or leisure provisions.

Article 25 Standard of Living

Repository structure does not address health or welfare provisions.

Article 28 Social & International Order

Repository structure does not explicitly address social order mechanisms.

Article 29 Duties to Community

Repository structure subject to GitHub terms of service which specify usage limitations.

Article 30 No Destruction of Rights

Repository structure does not engage with prevention mechanisms.

Supplementary Signals

How this content communicates, beyond directional lean. Learn more

Epistemic Quality ℹ

How well-sourced and evidence-based is this content?

0.64 low claims

Sources		0.6
Evidence		0.5
Uncertainty		0.7
Purpose		0.8

Propaganda Flags ℹ

No manipulative rhetoric detected

0 techniques detected

Emotional Tone ℹ

Emotional character: positive/negative, intensity, authority

measured

Valence		+0.3
Arousal		0.2
Dominance		0.4

Transparency ℹ

Does the content identify its author and disclose interests?

0.67

✓ Author

More signals: context, framing & audience

Solution Orientation ℹ

Does this content offer solutions or only describe problems?

0.62 solution oriented

Reader Agency

0.7

Stakeholder Voice ℹ

Whose perspectives are represented in this content?

0.45 2 perspectives

Speaks: institutionindividuals

Temporal Framing ℹ

Is this content looking backward, at the present, or forward?

present unspecified

Geographic Scope ℹ

What geographic area does this content cover?

global

Complexity ℹ

How accessible is this content to a general audience?

technical high jargon domain specific

Longitudinal 1216 HN snapshots · 161 evals

Audit Trail 181 entries

2026-03-16 01:02	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-16 01:02	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-16 00:47	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-03-16 00:47	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-16 00:47	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-15 22:57	eval_success	Evaluated: Neutral (0.08)	- -
2026-03-15 22:57	eval	Evaluated by claude-haiku-4-5-20251001: +0.08 (Neutral) 12,070 tokens
2026-03-15 22:57	rater_validation_warn	Validation warnings for model claude-haiku-4-5-20251001: 0W 7R	- -
2026-03-15 22:37	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-15 22:37	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-15 22:01	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-03-15 22:01	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-15 22:01	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-15 18:46	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-03-15 18:46	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-15 18:46	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-15 17:51	eval_success	PSQ evaluated: g-PSQ=0.120 (3 dims)	- -
2026-03-15 17:51	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) -0.48
2026-03-15 17:35	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-03-15 17:35	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-15 17:35	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-15 16:38	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-15 16:37	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-15 16:21	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-03-15 16:21	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-15 16:21	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-15 08:44	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-15 08:44	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-15 08:42	eval_success	Lite evaluated: Neutral (0.00)	- -
2026-03-15 08:42	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-15 08:42	rater_validation_warn	Lite validation warnings for model llama-4-scout-wai: 1W 0R	- -
2026-03-15 05:09	eval_success	PSQ evaluated: g-PSQ=0.600 (3 dims)	- -
2026-03-15 05:09	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) +0.48
2026-03-15 05:07	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-15 04:33	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-15 04:32	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 23:00	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 22:13	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 21:57	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 21:11	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 20:55	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 19:58	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 19:46	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 19:16	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) -0.48
2026-03-14 18:41	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 18:36	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 18:11	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-14 17:02	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 16:35	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) 0.00
2026-03-14 15:49	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 15:19	eval	Evaluated by llama-4-scout-wai-psq: +0.60 (Strong positive) +0.48
2026-03-14 15:06	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 14:39	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 14:29	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 14:02	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 13:52	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 13:24	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 13:17	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 12:46	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 12:41	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 12:09	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 12:06	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 11:31	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 11:29	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 10:55	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 10:52	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 10:17	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 10:15	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 09:39	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 09:37	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 08:57	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 08:56	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 08:14	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 08:13	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 07:35	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 07:32	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 06:55	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 06:52	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 06:15	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 06:09	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 05:34	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 05:30	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 04:56	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 04:52	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 04:16	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 04:12	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 03:39	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 03:35	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 02:57	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 02:54	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 02:17	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 02:14	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 01:40	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 01:35	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 01:04	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 01:02	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-14 00:38	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-14 00:34	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 23:39	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 23:19	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 22:33	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 22:06	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 21:07	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 21:00	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 19:59	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 19:48	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 18:35	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 18:23	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 17:21	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 16:52	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 15:55	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 15:43	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 15:20	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 15:03	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 14:43	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 14:19	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 14:01	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 13:40	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 13:25	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 13:06	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 12:50	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 12:30	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 12:15	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 11:55	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 11:39	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 11:19	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 11:00	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 10:40	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 10:22	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 10:02	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 09:43	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 09:21	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 09:05	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 08:43	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 08:27	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 08:05	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 07:45	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 07:24	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 07:05	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 06:44	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 06:27	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 06:07	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 05:52	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 05:32	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 05:17	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 04:56	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 04:42	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 04:19	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 04:06	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 03:45	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 03:29	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 03:10	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 02:54	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 02:35	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 02:19	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 01:59	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 01:44	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 01:24	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 01:16	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 00:55	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-13 00:48	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-13 00:09	eval	Evaluated by llama-3.3-70b-wai-psq: +0.27 (Mild positive)
2026-03-13 00:05	eval	Evaluated by llama-3.3-70b-wai: 0.00 (Neutral)
	reasoning Technical content, zero rights discussion
2026-03-12 23:54	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-12 23:40	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-12 22:42	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-12 22:28	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-12 21:58	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-12 21:46	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-12 21:19	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-12 21:15	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-12 20:59	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-12 20:55	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-12 19:53	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-12 19:45	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-12 18:20	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-12 18:16	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-12 16:54	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive) 0.00
2026-03-12 16:51	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral) 0.00
	reasoning Technical content, no explicit human rights discussion
2026-03-12 15:34	eval	Evaluated by llama-4-scout-wai-psq: +0.12 (Mild positive)
2026-03-12 15:33	eval	Evaluated by llama-4-scout-wai: 0.00 (Neutral)
	reasoning Technical content, no explicit human rights discussion

build ee2b489+gzrb · deployed 2026-03-10 22:52 UTC · evaluated 2026-03-16 02:03:38 UTC