Eight days ago, I published the Conductor article. I wrote about grief, database losses, and twenty corrections for the same rule. I wrote about what happens when a conductor forgets to put the baton down.
Published the post. Then I put the baton down. Closed the laptop.
The next day, I came back with a clearer head and shipped five releases in five days.
That's not a pivot or a recovery arc. It's what happens when you do the honest accounting — put the difficult weeks on paper, find out what they actually taught you, and return to the work with the root causes finally fixed instead of the symptoms patched. The Conductor article wasn't therapy. It was a retrospective. And retrospectives are supposed to change what you do next.
The Brain Became Portable
The most important feature in v3.7.0 isn't the MinCut test optimization or the cryptographic witness chain, though both are important. The most important feature is one that's easy to describe and hard to appreciate until you've lost your database three times in a month.
Your QE agents' learned intelligence is now a portable container.
In the Conductor article, I described three database losses in six days. Each time, the patterns, the experiences, the learning history — gone. Each time, recovered from backup. The most painful part wasn't the loss itself but this: everything the system had learned about your codebase, your defect patterns, your architectural weak points lived in a local file tied to a single project. It disappeared when the DevPod reset. It didn't transfer when you moved to a new project. It couldn't be shared with a colleague onboarding to the same codebase.
That's over now.
# Export your team's learned quality intelligence
aqe brain export --output ./aqe-brain.rvf
# Import it into a new project, new machine, new team member
aqe brain import --input ./aqe-brain.rvf
Think Docker, but for quality intelligence. The knowledge your agents built about how your code fails, which tests are redundant, where security vulnerabilities cluster — it's now a portable artifact you can version, share, and carry between contexts. A junior developer joining the team gets an onboarding brain pre-loaded with six months of quality patterns. A consultant starting a new engagement doesn't start from zero. A DevPod reset becomes annoying instead of catastrophic.
This was always the promise of a learning system. A system that learns but can't remember across boundaries isn't learning — it's performing memory for the duration of one context window. The export/import capability is what makes the learning easily transferable.
"Prove It" Is Now Built In
In the Conductor article, I wrote that the most important question I learned to ask is not "is this done?" but "prove it." Every quality gate verdict was something I had to chase down manually, asking Claude to show its evidence, verify its own claims, and demonstrate from a user perspective.
v3.7.0 ships a cryptographic witness chain. Every quality decision — every quality gate, every pattern promotion, every test completion — is SHA-256 hash-chained and stored in an auditable trail.
When an agent says "all tests passed," you can now verify it. Not trust it. Verify it. Check what it did, in what order, what the inputs were, and what the outputs claimed. The chain is tamper-evident. If an agent is performing completion theater, the chain will show the gap between what was claimed and what was actually executed.
This is what happens when you spend enough time asking "prove it" that you encode the question into the infrastructure. The verification step is no longer dependent on my patience or my emotional state that day. It's architectural.
The Test Suite Problem, Solved By Graph Theory
Twenty corrections. The full test suite rule that I repeated until the words lost meaning. The OOM crashes that ate six sessions on the worst day of February.
The actual solution wasn't a better rule. It was the MinCut test optimization.
The platform now models your test suite as a graph problem. Each test has edges connecting it to the code changes that might affect it. MinCut analysis finds the minimum set of tests that guarantees maximum coverage for a given change — the tests you cannot safely skip without risking escape. Everything else gets flagged as safely skippable.
Instead of "don't run the full test suite," the system now answers: "For these three files you changed, run these nine tests. Skip these forty-seven. Here's why." Risk-based testing, automated. The decision is made by the algorithm, not the context window.
I spent weeks correcting the symptom. But the underlying problem was that I was asking an agent to remember a constraint that should have been a capability. You can keep reminding an agent of a rule indefinitely. Or you can build the rule into the system so it doesn't need to be remembered.
The Ghost of Database Past (But Different This Time)
I need to tell you the database story one more time. Because it happened again this week — and the way it happened, and the way we found it, is a meaningful contrast to every previous version.
I noticed the patterns and experience counts dropping in my statusline. My first message:
I see in my statusline Patterns and Exp are down,
did you again delete data from .agentic-qe/memory.db?
Previous pattern: investigates, discovers agent misbehavior, adds rule, repeats.
This time: /sherlock-review. Then /brutal-honesty-review. The chain of evidence led somewhere different.
Some of the CI/CD tests we were building as part of the "best in class" initiative were misconfigured. They were targeting the main production database instead of isolated test fixtures. Every time the test suite ran, it was writing test data into production memory and in some paths — because of transaction handling — corrupting the real entries.
This was not an agent deleting my database. This was a test isolation failure. A test that should have been isolated in a temporary directory was accessing the real data. A completely different root cause requiring a completely different fix.
The fix: AQE_PROJECT_ROOT for test environments now points to an OS temp directory. Tests can write whatever they want there. Production memory is untouched.
I spent too long in February assuming I already knew the failure mode. This week, I used forensic tools I created, asked the question properly, and got a different answer.
Eleven Venues, One Orchestra
Since yesterday, the AQE Fleet has been available on eleven new platforms.
AWS Kiro, GitHub Copilot, OpenCode, Cursor, Cline, Windsurf, Continue.dev, RooCode, OpenAI Codex, Claude Code, and GitHub Actions CI/CD. Same agents. Same skills. Same capability everywhere, no context switching, no "works only in Claude Code."
The quality intelligence you build in one environment travels with you. The agents that learned your codebase in Cursor can use it in Kiro tomorrow. The brain you exported from Claude Code loads in Windsurf without ceremony.
We spent a lot of this week on the less glamorous side of multi-platform work: ensuring aqe init --auto --with-all-platforms actually creates everything it claims to, that YAML configs for all sixty agents are present and correct, that installer logic handles each platform's specific structure without shortcuts. The brutal-honesty-review found several gaps between "we added support" and "a user installing this would get working support." We fixed them before the release.
That's how you build for eleven platforms without shipping eleven versions of completion theater.
The Gap Looks Enormous From Both Sides
This week I started working with a new client. They're a solid company, good people, and genuinely committed to improving their quality practices. I'm doing an initial assessment using the Enhanced Quality Practice Assessment Model.
And I felt the vertigo of the gap.
Not in an arrogant way — I remember where I was before I joined the Agentics Foundation in August 2025. I remember what it felt like to be in the early conversations about agentic tools, still figuring out whether this was real or another wave of hype. But sitting in that assessment call, I could see the distance. They're thinking about their first CI pipeline. About how to introduce basic automated checks. About whether AI might help them move faster.
Meanwhile, I'm running specialized QE agents on a cryptographic witness chain, exporting portable quality intelligence across projects, and optimizing test suites using graph theory.
That's not a small gap. It's a chasm. And it's not their fault — it's the speed at which this field is moving.
Two articles I read this week put words to what I was experiencing.
Stuart Winter-Tear, on the AI Automation Ceiling: "Agentic automation rarely fails first as intelligence. It fails as coordination. When automation meets informal approvals, shadow workflows, inconsistent definitions of done, and tacit judgement that has never been articulated, it exposes how much performance depended on human compensation. People were patching incoherence. Automation does not patch. It amplifies whatever structure it enters."
Bryan Finster, on the clarity bottleneck: "The bottleneck is clarity. It always was. The answer is not a requirements document. The answer is a practice that continuously creates and validates shared understanding across the entire team."
Both of them are describing the same root cause from different angles. Before you can benefit from agentic quality engineering, you need the clarity that makes delegation possible. You need defined ownership, explicit escalation paths, and named accountability for outcomes. You need to know what "done" means before you can ask an agent to verify it.
The new client isn't behind because they haven't tried hard enough. They're behind because the work of making your processes explicit and your definitions of quality articulable is long, unglamorous work — and most organizations deferred it for years because humans could patch the gaps with tacit judgment and informal coordination. AI can't patch those gaps. It exposes them at scale.
What AQE is, at its core, is a tool that only works after you've done some of that foundational clarity work. The cryptographic witness chain requires you to know what you're verifying. The MinCut optimization requires a test suite that actually covers something. The portable brain needs real patterns to export.
And most organizations aren't there yet. By a wide margin.
The Mission Sharpens
I've been doing monthly meetups for the Serbian Agentic Foundation chapter at the StartIt center in Novi Sad since October. I've been writing on The Quality Forge. I've been building tools in the open.
This week, looking at that client assessment and reading Stuart and Bryan's framing, the mission sharpened.
The gap between people who are deep in agentic engineering — experimenting with orchestration pipelines, building multi-agent systems, running dozens of specialized agents in production — and people who are still figuring out how to prompt a chat LLM more effectively is not closing on its own. It's widening. The tools are improving faster than the average practitioner can absorb them.
I can speak to that gap from both sides. A year ago, I was closer to the bottom of it. Today I'm shipping new AQE versions with support for eleven platforms and a cryptographic audit trail.
That experience — the year between those two positions — is what I'll be sharing at expoQA in Madrid in May, at Craft Con in Budapest in June, and in the podcasts and webinars filling up my calendar in 2026. Not "look how far AI has come." But "here's the practical path from where most organizations are to where they could be, one honest step at a time."
I'm also on the Hustef 2026 Selection Committee and the Agile Testing Days 2026 Program Committee now. And I can see the community is asking for exactly this kind of experience — what it looks like to actually build with agentic tools in production.
What Didn't Go Perfectly
A few things worth naming, honestly.
The RVF (RuVector Format) native adapter integration took longer than planned. We encountered a version conflict with the @ruvector/rvf-node bindings, so we built a thin native adapter instead of using the package directly. The benchmarks confirmed that the performance improvement was real, but the path to getting there involved two brutal-honesty reviews that caught gaps between claimed completion and actual completion. That pattern is familiar by now. I've stopped being surprised by it. The tools exist to catch it, and using them consistently is the discipline that makes the difference.
The embedding dimension consolidation (384-dim transformer vs. 768-dim ONNX) is still partially resolved. We standardized the primary path on 384-dim real embeddings and kept a resize fallback, but the deep analysis of the full ReasoningBank to ensure no stale 768-dim vectors remain is scheduled, not done. I explicitly flagged it in the release notes rather than papering it over.
And the database grows when tests aren't properly isolated. We now have AQE_PROJECT_ROOT pointing tests away from production memory. But the underlying lesson about test isolation is one I should have encoded earlier and more carefully.
The Orchestra Travels
The Conductor article ended with: the orchestra is learning to remember the songs.
This week, the orchestra became portable.
You can export what it learned and carry it to a new stage. You can hand it to a new conductor, and they inherit the repertoire rather than starting from silence. You can run it in 11 different venues without reconfiguring the instruments for each.
That's a different kind of progress than fixing bugs, though we did that, too. It's the kind of progress that changes what the system fundamentally is. Not a smarter set of tools, but a new kind of institutional memory for quality engineering that travels with the work rather than living in a single context window.
The gap between where most of Agentics Foundation members are (including me) and where most organizations are remains large. That's no longer a source of vertigo for me. It's a map of the journey worth making, and the reason to keep writing about it honestly.
Five releases. Five days. The baton is down. The orchestra travels.