No observable privacy policy or data handling disclosures on domain.
Terms of Service
—
No observable terms of service on domain.
Accessibility
+0.10
Article 26
Astro-based site with semantic HTML structure and screen-reader annotations (.astro-route-announcer), suggesting accessibility intent; dark mode support present.
Mission
—
Author bio indicates focus on observability and distributed systems engineering; no explicit mission statement on domain.
Editorial Code
—
No observable editorial standards or ethics policy.
Ownership
—
Personal blog; author identified as Boris Tane, Engineering Lead at Cloudflare.
Access Model
+0.15
Article 19
Blog content is publicly accessible with no paywall or registration barrier; supports free expression and information access.
Ad/Tracking
—
No observable advertising or tracking mechanisms on domain.
I do something very similar, also with Claude and Codex, because the workflow is controlled by me, not by the tool. But instead of plan.md I use a ticket system basically like ticket_<number>_<slug>.md where I let the agent create the ticket from a chat, correct and annotate it afterwards and send it back, sometimes to a new agent instance. This workflow helps me keeping track of what has been done over time in the projects I work on. Also this approach does not need any „real“ ticket system tooling/mcp/skill/whatever since it works purely on text files.
> Notice the language: “deeply”, “in great details”, “intricacies”, “go through everything”. This isn’t fluff. Without these words, Claude will skim. It’ll read a file, see what a function does at the signature level, and move on. You need to signal that surface-level reading is not acceptable.
This makes no sense to my intuition of how an LLM works. It's not that I don't believe this works, but my mental model doesn't capture why asking the model to read the content "more deeply" will have any impact on whatever output the LLM generates.
I go a bit further than this and have had great success with 3 doc types and 2 skills:
- Specs: these are generally static, but updatable as the project evolves. And they're broken out to an index file that gives a project overview, a high-level arch file, and files for all the main modules. Roughly ~1k lines of spec for 10k lines of code, and try to limit any particular spec file to 300 lines. I'm intimately familiar with every single line in these.
- Plans: these are the output of a planning session with an LLM. They point to the associated specs. These tend to be 100-300 lines and 3 to 5 phases.
- Working memory files: I use both a status.md (3-5 items per phase roughly 30 lines overall), which points to a latest plan, and a project_status (100-200 lines), which tracks the current state of the project and is instructed to compact past efforts to keep it lean)
- A planner skill I use w/ Gemini Pro to generate new plans. It essentially explains the specs/plans dichotomy, the role of the status files, and to review everything in the pertinent areas of code and give me a handful of high-level next set of features to address based on shortfalls in the specs or things noted in the project_status file. Based on what it presents, I select a feature or improvement to generate. Then it proceeds to generate a plan, updates a clean status.md that points to the plan, and adjusts project_status based on the state of the prior completed plan.
- An implementer skill in Codex that goes to town on a plan file. It's fairly simple, it just looks at status.md, which points to the plan, and of course the plan points to the relevant specs so it loads up context pretty efficiently.
I've tried the two main spec generation libraries, which were way overblown, and then I gave superpowers a shot... which was fine, but still too much. The above is all homegrown, and I've had much better success because it keeps the context lean and focused.
And I'm only on the $20 plans for Codex/Gemini vs. spending $100/month on CC for half year prior and move quicker w/ no stall outs due to token consumption, which was regularly happening w/ CC by the 5th day. Codex rarely dips below 70% available context when it puts up a PR after an execution run. Roughly 4/5 PRs are without issue, which is flipped against what I experienced with CC and only using planning mode.
I craft a detailed and ordered set of lecture notes in a Quarto file and then have a dedicated claude code skill for translating those notes into Slidev slides, in the style that I like.
Once that's done, much like the author, I go through the slides and make commented annotations like "this should be broken into two slides" or "this should be a side-by-side" or "use your generate clipart skill to throw an image here alongside these bullets" and "pull in the code example from ../examples/foo." It works brilliantly.
And then I do one final pass of tweaking after that's done.
But yeah, annotations are super powerful. Token distance in-context and all that jazz.
The author is quite far on their journey but would benefit from writing simple scripts to enforce invariants in their codebase. Invariant broken? Script exits with a non-zero exit code and some output that tells the agent how to address the problem. Scripts are deterministic, run in milliseconds, and use zero tokens. Put them in husky or pre-commit, install the git hooks, and your agent won’t be able to commit without all your scripts succeeding.
And “Don’t change this function signature” should be enforced not by anticipating that your coding agent “might change this function signature so we better warn it not to” but rather via an end to end test that fails if the function signature is changed (because the other code that needs it not to change now has an error). That takes the author out of the loop and they can not watch for the change in order to issue said correction, and instead sip coffee while the agent observes that it caused a test failure then corrects it without intervention, probably by rolling back the function signature change and changing something else.
I actually don't really like a few of things about this approach.
First, the "big bang" write it all at once. You are going to end up with thousands of lines of code that were monolithically produced. I think it is much better to have it write the plan and formulate it as sensible technical steps that can be completed one at a time. Then you can work through them. I get that this is not very "vibe"ish but that is kind of the point. I want the AI to help me get to the same point I would be at with produced code AND understanding of it, just accelerate that process. I'm not really interested in just generating thousands of lines of code that nobody understands.
Second, the author keeps refering to adjusting the behaviour, but never incorporating that into long lived guidance. To me, integral with the planning
process is building an overarching knowledge base. Every time you're telling it
there's something wrong, you need to tell it to update the knowledge base about
why so it doesn't do it again.
Finally, no mention of tests? Just quick checks? To me, you have to end up with
comprehensive tests. Maybe to the author it goes without saying, but I find it is
integral to build this into the planning. Certain stages you will want certain
types of tests. Some times in advance of the code (so TDD style) other times
built alongside it or after.
It's definitely going to be interesting to see how software methodology evolves
to incorporate AI support and where it ultimately lands.
* I ask the LLM for it's understanding of a topic or an existing feature in code. It's not really planning, it's more like understanding the model first
* Then based on its understanding, I can decide how great or small to scope something for the LLM
* An LLM showing good understand can deal with a big task fairly well.
* An LLM showing bad understanding still needs to be prompted to get it right
* What helps a lot is reference implementations. Either I have existing code that serves as the reference or I ask for a reference and I review.
A few folks do it at my work do it OPs way, but my arguments for not doing it this way
* Nobody is measuring the amount of slop within the plan. We only judge the implementation at the end
* it's still non deterministic - folks will have different experiences using OPs methods. If claude updates its model, it outdates OPs suggestions by either making it better or worse. We don't evaluate when things get better, we only focus on things not gone well.
* it's very token heavy - LLM providers insist that you use many tokens to get the task done. It's in their best interest to get you to do this. For me, LLMs should be powerful enough to understand context with minimal tokens because of the investment into model training.
Both ways gets the task done and it just comes down to my preference for now.
For me, I treat the LLM as model training + post processing + input tokens = output tokens. I don't think this is the best way to do non deterministic based software development. For me, we're still trying to shoehorn "old" deterministic programming into a non deterministic LLM.
Certainly the “unsupervised agent” workflows are getting a lot of attention right now, but they require a specific set of circumstances to be effective:
- clear validation loop (eg. Compile the kernel, here is gcc that does so correctly)
- ai enabled tooling (mcp / cli tool that will lint, test and provide feedback immediately)
- oversight to prevent sgents going off the rails (open area of research)
- an unlimited token budget
That means that most people can't use unsupervised agents.
Not that they dont work; Most people have simply not got an environment and task that is appropriate.
By comparison, anyone with cursor or claude can immediately start using this approach, or their own variant on it.
It does not require fancy tooling.
It does not require an arcane agent framework.
It works generally well across models.
This is one of those few genunie pieces of good practical advice for people getting into AI coding.
Simple. Obviously works once you start using it. No external dependencies. BYO tools to help with it, no “buy my AI startup xxx to help”. No “star my github so I can a job at $AI corp too”.
I've been teaching AI coding tool workshops for the past year and this planning-first approach is by far the most reliable pattern I've seen across skill levels.
The key insight that most people miss: this isn't a new workflow invented for AI - it's how good senior engineers already work. You read the code deeply, write a design doc, get buy-in, then implement. The AI just makes the implementation phase dramatically faster.
What I've found interesting is that the people who struggle most with AI coding tools are often junior devs who never developed the habit of planning before coding. They jump straight to "build me X" and get frustrated when the output is a mess. Meanwhile, engineers with 10+ years of experience who are used to writing design docs and reviewing code pick it up almost instantly - because the hard part was always the planning, not the typing.
One addition I'd make to this workflow: version your research.md and plan.md files in git alongside your code. They become incredibly valuable documentation for future maintainers (including future-you) trying to understand why certain architectural decisions were made.
> After Claude writes the plan, I open it in my editor and add inline notes directly into the document. These notes correct assumptions, reject approaches, add constraints, or provide domain knowledge that Claude doesn’t have.
This is the part that seems most novel compared to what I've heard suggested before. And I have to admit I'm a bit skeptical. Would it not be better to modify what Claude has written directly, to make it correct, rather than adding the corrections as separate notes (and expecting future Claude to parse out which parts were past Claude and which parts were the operator, and handle the feedback graciously)?
At least, it seems like the intent is to do all of this in the same session, such that Claude has the context of the entire back-and-forth updating the plan. But that seems a bit unpleasant; I would think the file is there specifically to preserve context between sessions.
This is quite close to what I've arrived at, but with two modifications
1) anything larger I work on in layers of docs. Architecture and requirements -> design -> implementation plan -> code. Partly it helps me think and nail the larger things first, and partly helps claude. Iterate on each level until I'm satisfied.
2) when doing reviews of each doc I sometimes restart the session and clear context, it often finds new issues and things to clear up before starting the next phase.
> the workflow I’ve settled into is radically different from what most people do with AI coding tools
This looks exactly like what anthropic recommends as the best practice for using Claude Code. Textbook.
It also exposes a major downside of this approach: if you don't plan perfectly, you'll have to start over from scratch if anything goes wrong.
I've found a much better approach in doing a design -> plan -> execute in batches, where the plan is no more than 1,500 lines, used as a proxy for complexity.
My 30,000 LOC app has about 100,000 lines of plan behind it. Can't build something that big as a one-shot.
> Read deeply, write a plan, annotate the plan until it’s right, then let Claude execute the whole thing without stopping, checking types along the way.
As others have already noted, this workflow is exactly what the Google Antigravity agent (based off Visual Studio Code) has been created for. Antigravity even includes specialized UI for a user to annotate selected portions of an LLM-generated plan before iterating it.
One significant downside to Antigravity I have found so far is the fact that even though it will properly infer a certain technical requirement and clearly note it in the plan it generates (for example, "this business reporting column needs to use a weighted average"), it will sometimes quietly downgrade such a specialized requirement (for example, to a non-weighted average), without even creating an appropriate "WARNING:" comment in the generated code. Especially so when the relevant codebase already includes a similar, but not exactly appropriate API. My repetitive prompts to ALWAYS ask about ANY implementation ambiguities WHATSOEVER go unanswered.
From what I gather Claude Code seems to be better than other agents at always remembering to query the user about implementation ambiguities, so maybe I will give Claude Code a shot over Antigravity.
I don’t use plan.md docs either, but I recognise the underlying idea: you need a way to keep agent output constrained by reality.
My workflow is more like scaffold -> thin vertical slices -> machine-checkable semantics -> repeat.
Concrete example: I built and shipped a live ticketing system for my club (Kolibri Tickets). It’s not a toy: real payments (Stripe), email delivery, ticket verification at the door, frontend + backend, migrations, idempotency edges, etc. It’s running and taking money.
The reason this works with AI isn’t that the model “codes fast”. It’s that the workflow moves the bottleneck from “typing” to “verification”, and then engineers the verification loop:
-keep the spine runnable early (end-to-end scaffold)
-add one thin slice at a time (don’t let it touch 15 files speculatively)
-force checkable artifacts (tests/fixtures/types/state-machine semantics where it matters)
-treat refactors as normal, because the harness makes them safe
If you run it open-loop (prompt -> giant diff -> read/debug), you get the “illusion of velocity” people complain about. If you run it closed-loop (scaffold + constraints + verifiers), you can actually ship faster because you’re not paying the integration cost repeatedly.
Plan docs are one way to create shared state and prevent drift. A runnable scaffold + verification harness is another.
> One trick I use constantly: for well-contained features where I’ve seen a good implementation in an open source repo, I’ll share that code as a reference alongside the plan request. If I want to add sortable IDs, I paste the ID generation code from a project that does it well and say “this is how they do sortable IDs, write a plan.md explaining how we can adopt a similar approach.” Claude works dramatically better when it has a concrete reference implementation to work from rather than designing from scratch.
Licensing apparently means nothing.
Ripped off in the training data, ripped off in the prompt.
I think the real value here isn’t “planning vs not planning,” it’s forcing the model to surface its assumptions before they harden into code.
LLMs don’t usually fail at syntax. They fail at invisible assumptions about architecture, constraints, invariants, etc. A written plan becomes a debugging surface for those assumptions.
Score Breakdown
+0.26
PreamblePreamble
Medium Framing
Editorial
+0.20
Structural
+0.10
SETL
+0.14
Combined
ND
Context Modifier
ND
Content indirectly affirms dignity through emphasis on human judgment in AI collaboration. Framing presents human autonomy and decision-making authority as central to effective development, aligning with human dignity principles. However, content is narrowly focused on technical workflow rather than universal human rights.
+0.23
Article 1Freedom, Equality, Brotherhood
Medium Framing
Editorial
+0.15
Structural
+0.10
SETL
+0.09
Combined
ND
Context Modifier
ND
Implicitly affirms equality by positioning developer and AI tool as collaborative equals requiring mutual understanding. No explicit statement of human equality, but workflow design respects human judgment over machine output.
ND
Article 2Non-Discrimination
No observable content addressing non-discrimination or protection from discrimination in employment or service.
ND
Article 3Life, Liberty, Security
No observable content addressing security of person or freedom from arbitrary detention.
ND
Article 4No Slavery
No observable content addressing slavery or servitude.
ND
Article 5No Torture
No observable content addressing torture or cruel treatment.
ND
Article 6Legal Personhood
No observable content addressing right to legal personhood.
ND
Article 7Equality Before Law
No observable content addressing equal protection before the law.
ND
Article 8Right to Remedy
No observable content addressing remedy for violation of rights.
ND
Article 9No Arbitrary Detention
No observable content addressing arbitrary arrest or detention.
ND
Article 10Fair Hearing
No observable content addressing fair trial or hearing.
ND
Article 11Presumption of Innocence
No observable content addressing presumption of innocence or retrospective law.
ND
Article 12Privacy
No observable content addressing privacy in communications or home.
ND
Article 13Freedom of Movement
No observable content addressing freedom of movement.
ND
Article 14Asylum
No observable content addressing asylum or refuge.
ND
Article 15Nationality
No observable content addressing nationality.
ND
Article 16Marriage & Family
No observable content addressing marriage or family.
ND
Article 17Property
No observable content addressing property rights.
ND
Article 18Freedom of Thought
No observable content addressing freedom of conscience or religion.
+0.46
Article 19Freedom of Expression
High Practice Advocacy
Editorial
+0.35
Structural
+0.25
SETL
+0.19
Combined
ND
Context Modifier
ND
Content strongly affirms freedom of expression and opinion through demonstrating how structured dialogue enables clear communication and information exchange. Workflow emphasizes transparent written documentation and iterative feedback, enabling articulate expression of technical judgment. Public blog itself exemplifies freedom to share ideas. Domain policy (free access) supports information distribution.
ND
Article 20Assembly & Association
No observable content addressing freedom of assembly or association.
ND
Article 21Political Participation
No observable content addressing participation in government.
ND
Article 22Social Security
No observable content addressing social security or welfare rights.
+0.23
Article 23Work & Equal Pay
Medium Framing
Editorial
+0.20
Structural
+0.15
SETL
+0.10
Combined
ND
Context Modifier
ND
Content implicitly addresses right to work and choice of employment through advocating for developer agency and control over work methodology. Workflow design emphasizes human judgment in labor process. Author's role as Engineering Lead at Cloudflare reflects employment context. No explicit treatment of fair wages, equal pay, or worker protections.
ND
Article 24Rest & Leisure
No observable content addressing rest or leisure.
ND
Article 25Standard of Living
No observable content addressing health, food, housing, or medical care.
+0.31
Article 26Education
Medium Framing
Editorial
+0.25
Structural
+0.15
SETL
+0.16
Combined
ND
Context Modifier
ND
Content addresses education and skill development indirectly through teaching methodology (research, planning, iterative feedback). Workflow promotes development of technical judgment and software engineering competency. Public sharing of knowledge supports right to education. No explicit commitment to universal education access.
+0.23
Article 27Cultural Participation
Medium Framing
Editorial
+0.20
Structural
+0.15
SETL
+0.10
Combined
ND
Context Modifier
ND
Content indirectly addresses cultural and scientific participation through sharing technical methodology that enables creative problem-solving. Author contributes to engineering culture. Limited scope — does not address protection of intellectual property, authorship rights, or broader cultural participation.
ND
Article 28Social & International Order
No observable content addressing social and international order necessary for rights realization.
+0.17
Article 29Duties to Community
Medium Framing
Editorial
+0.15
Structural
+0.10
SETL
+0.09
Combined
ND
Context Modifier
ND
Workflow implicitly affirms duties and responsibilities by emphasizing careful review, domain knowledge integration, and prevention of harm (broken implementations). Annotation cycle reflects sense of responsibility toward code quality and system integrity. Limited scope — does not address explicit duties to community or society.
ND
Article 30No Destruction of Rights
No observable content addressing prevention of destruction of rights or freedoms.