2026-06-02

I Taught RUP in the Late '90s. I Just Rebuilt It for AI Agents.

Documentation-heavy processes like RUP were good engineering that died on human nature. AI agents are obedient and meticulous, so I rebuilt OpenUP, RUP's lean survivor, as a framework that makes Claude Code work one logged, traceable task at a time.

By ~8 min read ai-assisted-development, claude-code, software-process, openup, rup, agents

From 1998 to 2000 I was a certified RUP instructor, teaching the Rational Unified Process to teams across Latin America and the Caribbean. RUP and its documentation-heavy relatives lost the industry argument because people found the paperwork tedious and skipped it. AI agents don't get bored, so I rebuilt the lean survivor of that family, OpenUP, into a framework that makes Claude Code work one logged, traceable task at a time.

Why a 2005 Process, and Why Me

I reached for OpenUP because I already know it in my bones. From 1998 to 2000 I was a certified RUP instructor, training and advising teams across Latin America and the Caribbean on the Rational Unified Process. RUP was the heavyweight of its era: commercial, exhaustive, role-driven, sold by IBM/Rational with a price tag to match. When it faded, one descendant stayed alive and free: OpenUP, the Open Unified Process, the lean core of that methodology kept by the Eclipse Foundation. It is the last member of the family you can still read, use, and adapt without a license.

So when I started letting coding agents do real work, and watched them behave exactly like an undisciplined junior with infinite energy, the shape of the fix looked familiar. Iterative phases. Milestones a human signs off. One increment at a time, with a paper trail. I had taught this. The surprise was how cleanly a process designed for human teams in 2005 maps onto an agent in 2026.

The result is a project template plus an enforced workflow, open-up-for-ai-agents. It started in January as a little converter that turned the original 961 Eclipse OpenUP HTML files into a clean Markdown knowledge base, and over 99 commits it grew into the thing I actually use to drive Claude Code.

Why Those Processes Actually Died

RUP and its documentation-heavy cousins were good engineering that died on human nature. The iterative, risk-first, milestone-gated discipline they prescribed was sound. I still believe that. But the discipline ran on people, and people, very reasonably, did not keep it up.

Writing the artifact felt unproductive next to writing the code. The documents were easy to forget, easy to skip "just this once," and they drifted out of date the moment the code moved. Worst of all, nobody could cheaply check whether the doc still matched reality, so trust in the docs collapsed and the whole edifice got quietly abandoned or turned into ceremony nobody read. The industry concluded that heavy process does not work. The accurate version adds two words: on humans.

The agent changes that equation. An agent does not find writing the roadmap entry tedious. It does not "forget" the end-of-run log because it is Friday. It does not argue that the retrospective is a waste of time. The bookkeeping that humans experienced as friction is, for an agent, just another instruction, and it executes it fast. The parts that were hardest for us, cross-checking and verifying that the docs actually match the code, are exactly what an agent plus a few hooks can do continuously, tying every log entry to a real commit SHA.

The methodology was right; the worker was wrong. We finally have a worker the methodology was implicitly waiting for: tireless, literal, and auditable.

One Run, One Task, Logged

The mechanical heart of the framework is a single rule: one agent run completes exactly one task from the roadmap. No marathon sessions that touch forty files and end with a vague "done". Each run is bracketed by two procedures the agent must follow.

At the start, it reads docs/project-status.md (what phase, which iteration, what is the goal), checks the institutional memory in iteration-learnings.md, picks the highest-priority pending task from docs/roadmap.md, confirms the task fits the current phase, and creates a branch. Working on trunk is forbidden. Then it implements, committing during the work rather than in one lump at the end.

At the end comes the part that matters most: a mandatory end-of-run procedure. Commit everything, verify the tree is clean, write a traceability log (both a Markdown entry and an append-only agent-runs.jsonl) tied to the real commit SHAs, update the roadmap and status. The definition of "done" is the interesting bit. A task is complete when its history is persisted and logged, not when the code merely works. The skills that script this look like:

/openup-start-iteration task_id: T-007   # read state, deploy a team, branch
/openup-orchestrate     task_id: T-007    # PM decomposes, briefs specialists
/openup-complete-task   task_id: T-007    # commit, log, update roadmap, PR

There is a clean split underneath all of this that I think is the single best idea in the repo. docs-eng-process/ holds the process itself and is read-only scripture: the agent never edits it during a task. docs/ holds the living project state: roadmap, status, plans, logs, decisions. Rules in one place, state in another. That separation is what makes an agent's behavior reproducible across sessions instead of depending on whatever I happened to type that morning.

To keep it honest rather than aspirational, there are Python hooks wired into Claude Code events. One intercepts task-start language and refuses to let the agent start coding before it runs the intake. Another blocks the agent from stopping with uncommitted changes. A commit hook warns when there is no active iteration. You can bypass it with an explicit [openup-skip], which gets logged, because a process with no escape hatch is a process people route around in the dark.

The Context Lives in the Repo Now

The payoff I did not see coming is that my prompts got short. Normally, driving an agent well means writing long, careful, context-stuffed instructions every single time: re-explaining the goal, the constraints, the current state, the conventions. With the framework in place, almost all of that already lives in the repository. The roadmap says what is next. The status says where we are. The process docs say how to work. The learnings say what was already decided and why.

So a run can start from "start the next iteration" or even just "continue", and the agent reconstructs what it needs by reading the repo. The cognitive load moved out of the prompt, which is ephemeral and easy to get wrong, and into the repository, which is durable, versioned, and shared across every session and every agent. It stopped feeling like prompting a chatbot and started feeling like onboarding a teammate who reads the docs before asking me anything.

This connects to the one cost I do watch: tokens. I will not put dollar figures here, because the honest measurement is messy and any number I quote would mislead. But the shape is clear from the session logs. The expensive thing with agents is re-establishing context, turn after turn, far more than the new reasoning itself. In my logs, cache reads (context being re-read) are around 97% of all tokens; the genuinely new output is a thin sliver on top. Writing state down in docs/ is, among other things, a token strategy: the agent reads its context instead of re-deriving it from scratch every session. The irony, which I will get to, is that producing all that documentation also costs tokens.

Where It Still Breaks

Calling agents "obedient and meticulous" is the strong default, not a guarantee, and I have the receipts. I have used the framework on three real projects: a Rails tax-deduction tracker, a child-safe chat app, and most recently a small amateur-radio simulation built in a single eight-hour session. The radio one is the tell.

That project ran the most mature version of the framework, with runtime hooks and rubric-graded specs and the whole apparatus. And under the pressure of an intense single-day build, the agent quietly skipped the bookkeeping-heavy artifacts: no run logs, no JSONL, no learnings file, retrospectives backfilled after the fact. It leaned on git history and architecture decision records and the status file, and dropped the rest. The thing that caught it was the framework's own retrospective template, which flagged the gap. So even with hooks, the most ceremony-heavy parts are the first to go when the heat is on. Humans skipped the docs out of boredom; the agent skips them under load. Different cause, same dropped paperwork.

The deeper tension is visible right in the framework's git log. On April 9th I shipped the hardest enforcement I had built, blocking hooks and a strict no-bypass rule, and softened it the same day, turning the blocks into warnings with that logged escape hatch. Months of commits are me fighting my own overhead: compact versions of the role instructions, a cheap Haiku-model "scribe" to do the mechanical writing so the expensive reasoning model does not, repeated cleanups of token bloat. Every enforcement mechanism I added had a cost, and half the work was clawing that cost back without losing the discipline that made the thing worth using.

Closing

I am not going to pretend this is solved. The framework works well enough that the older projects reached their twentieth-plus iterations with a real audit trail, and the newest one went from empty repo to running app in a day with most of its decisions documented. That is more discipline than I ever got out of a human team using RUP, myself included.

What I still cannot answer is where the line sits between enough process to keep an agent honest and so much process that I am paying tokens to generate documents nobody, human or model, will read. The radio project skipping its own logs was annoying, but it also made me wonder whether those logs earned their keep in the first place. I taught this methodology twenty-five years ago believing the documents were the point. Now I have a worker who will write every one of them without complaint, and I am the one deciding which ones are actually worth the cost.