V3 Journey Classical QE Team Transfer 13 min read

The Question That Followed Me Home

One week, six releases, a guest from London, two meetups, three developer conversations that changed how I think about what this work is becoming, and multiple preparations for conference talks.

Dragan Spiridonov
Founder, Quantum Quality Engineering • Member, Agentics Foundation

The previous article ended with the loop closing and the forest helping. The loop did close. And then Jordi filed another issue, and the loop turned out to have four more bugs stacked on top of each other. The forest did help. And then Monday came, and the calendar reminded me that I had two meetups, a conference to prepare for, a platform to maintain, and a growing number of people who want to understand how all of this actually works.

This is the article about the week when the interest started arriving faster than I could schedule it.


When the Stacked Bugs Stacked Again

The last article celebrated v3.9.31 — the release where the self-learning loop finally closed end-to-end. Jordi’s issue #491, filed the day after publication, proved that celebration was premature. On a clean install, the learning consolidation worker ran every cycle and consolidated nothing. The loop-health dashboard permanently showed “never-ran.” Four independent bugs, each invisible on its own, stacked into a pipeline that was alive on paper and dead in practice.

v3.9.32 fixed all four and added what should have existed before: a daemon-runtime seam test suite that runs before every publish. If the next regression in this class ships, it will fail at the release gate, not at a user’s install.

Then v3.9.34 stopped the append-only vector files from growing without bound. One field report showed patterns.rvf reaching 59 gigabytes on a fresh clone — breaking git, Vite, and npm cache. The fix landed three compaction mechanisms: post-dream, boot-time size guard, and a bounded backfill cap, so a truncated file does not replay every historical row on reopen.

v3.10.0 earned the minor version bump. An external embedder endpoint, OpenAI-compatible, so the fleet can route vector computation through a local llama-server instead of loading the model in-process. Fifteen milliseconds cold against localhost, 1.6 milliseconds warm. For Ruflo and Ruvector co-deployments, this eliminates duplicate model loads entirely.

v3.10.1 was the one that made me pause. The LLM router — the vendor-independent layer documented in ADR-043 and ADR-051 — had been implemented for months. Every domain service accepted it as an optional dependency. Zero callers passed one. The LLM analysis branches in fifteen service paths across eleven domains were unreachable dead code. A friend sent me a detailed email pointing out that what the documentation promised and what the code delivered were not the same thing. A devil’s-advocate audit surfaced ten findings before merge. The router is now wired. It took an external contributor to notice that the feature did not work.

Six releases in eight days. Each one driven by someone outside the project — Jordi’s forensic issue reports, a feature request for external embeddings, a friend who read the docs carefully. The pattern from the previous article holds: the dominant driver is no longer my roadmap. It is the community using the tool and telling me what does not work.


Belgrade, Novi Sad, and the Guest from London

On Wednesday, I was in Belgrade for a hands-on testing session organized by Context Community. Twenty to thirty people in the room. The format is the one I keep coming back to: take an open-source project, initialize the fleet, and let the agents work while the audience watches. Six or seven agents produce a detailed quality analysis — security, performance, architecture, code smells — in under twenty minutes. Every time I do this, the same thing happens. People need time for it to sink in. The questions start slowly and then do not stop.

A couple of familiar faces were there, regulars from previous sessions. And then someone I did not expect — a practitioner from London. He had watched my Ministry of Testing masterclass on using an agentic approach in quality engineering and decided to fly to Serbia to attend these meetups in person. I invited him to join me in Novi Sad the next day.

On Thursday, before the meetup, I showed him around the Petrovaradin fortress and walked through the city center. The conversation was not about tools. It was about the question I keep hearing: how do you transfer this to a team? People are learning to work with these systems individually, but the gap between a practitioner’s skill and an organization’s capability is wider than most teams realize. I do not have a complete answer yet. I have fragments, and I know I need to write more about them.

The Agentics Foundation meetup that evening — our twelfth — was the most interactive session we have had. A dozen people in person, a few more on Zoom. I needed to rebuild my entire DevPod environment live because the Docker incident from the previous article had wiped my containers. So the audience watched the real thing — agents setting up Docker, building the Community Social Network application from scratch, fixing bugs as they found them. The QE agents surfaced security issues, performance bottlenecks, and test coverage gaps. Someone asked whether the security analysis was based on scanning code or running endpoints. Code only, I said — but the agents follow code paths across multiple files, tracing smells and potential bottlenecks based on patterns from training data. That is where the value is for experienced developers: agents can hold more context simultaneously than a human reviewer.

The demonstration showed what I keep seeing across every project: agents finding connections and relationships that developers with ten or twenty years of experience would not have expected to find. Not because the developers were careless. Because the agents can cross-reference patterns across the entire codebase simultaneously, which no human can do at the same speed.


The Industry Is Moving

On Friday and Saturday, I had calls with engineering practitioners who wanted to go deeper. One was interested in the architecture of the fleet — how MinCut clustering works, how the self-learning loop routes prompts, how the whole system holds together. That conversation made me realize there should be a dedicated article on the architecture, because the interest is no longer casual. People want to understand how to build something like this, not just use it.

The other call was a live demo on one of the developer’s own projects. Initialize the fleet, prompt the agents for quality information. The reaction was the same one I keep getting: developers who have been in the industry for decades seeing issues surfaced that were beyond their practical reach — not because they lacked the skill, but because no human can hold that many cross-references in working memory at once.

The shift is happening. Paul Gerard, in his Testing Anything YouTube channel, recently shared he will not write code ever again. That statement, from someone with his standing, is a marker. I have heard versions of it from other names in the industry. The coding agents and LLMs can now produce code that matches or exceeds what a human with twenty years of experience writes. That is no longer a controversial claim — it is an observable fact in a growing number of contexts.

What changes is not whether we write code. What changes is where the value lives. The value is moving from writing code to planning, designing, and architecting the solution — and then verifying and validating that what was produced actually solves the problem we set out to solve. That verification step is where quality people have always lived. The tools changed. The discipline did not.

This is the thread I will pull on at ExpoQA next week. Classical quality engineering principles — context-driven testing, risk-based thinking, evidence over assumption — do not become obsolete when agents write the code. They become more important. Because the volume of code that needs verification just multiplied, and the cost of trusting unverified output is the same as it ever was.


What Cognitum Is Teaching Me About Pace

The work with Reuven on Cognitum continued through the week. The honest version: I am struggling to keep up with his development tempo. Ruv ships fast, and the system has almost twenty sub-modules. Setting up quality processes for something that moves at this speed, where new features land before the previous ones are fully tested, is a different kind of challenge than I have faced before. Not harder — different. The constraint is not complexity. It is pace.

The quality system is being set. Devices are out with users. Issues surface, and we fix them and deploy. But the gap between “quality processes are being set up” and “quality processes are running smoothly” is real, and I am in the middle of that gap right now.


What Is Ahead

After I finish this article, I need to practice my presentation for ExpoQA next week in Madrid. The week after, the first week of June, I am in Budapest, presenting at two events. The first presentation is for the event organized by the Hungarian chapter of the Agentics Foundation, at which we will gather members from around the world. The second one is the ninety-minute hands-on session at Craft Conference with specialized QE agents for testing. The Agentics Foundation core crew will be there. I am looking forward to seeing them in person.

Two things are crystallizing for June when I return from Budapest. The Agentic Engineering Training Committee, which I now chair, needs a direction. I have ideas, practical, hands-on, grounded in real experience, but the first step is listening to the community about what they actually need. The same question through a different lens: as one of the AI chapter leaders for the Ministry of Testing, what kinds of events and learning would best serve not just the quality community but the broader engineering community as it learns to work with these new tools?

These are not separate questions. They are the same question asked from two positions. And the answer has to come from other practitioners in the field.


What I Would Tell the Guest from London

The question that stayed with me this week — the one the London guest asked during our walk through Petrovaradin — was simple: how do I bring this to my team?

I do not have a final answer.

What I have is still a work in progress, and I managed to show him parts of it: a fleet that learns from its own outcomes. A meetup where the audience watches agents rebuild an application and find bugs in real time. A pattern where the best bug reports come from one contributor who treats the project like it matters. A shift in the industry, with value moving from coding to verifying.

None of that transfers through a slide deck. It transfers through practice, through doing the work in front of people who can see the rough edges and ask questions. That is why I keep doing meetups, keep accepting invitations, keep showing the real thing instead of the polished version. The polished version would hide the Docker rebuild or Claude skipping an instruction. The real version lets the audience see the tools break, you fix them live, and they can learn.

Keep learning. Keep sharing. Knowledge is power.

The next article will follow Madrid and Budapest, once I have heard what others in the field are seeing. For now, the talk needs to be practiced. The fleet needs another release. And the question about bringing this to teams needs more thinking before it gets a real answer.


This is the thirtieth article in The Quality Forge series. Previous: “The Forest and the Feedback Loop” described the two weeks when the learning loop closed end-to-end and the contributor whose forensic bug reports drove thirteen releases. This one describes the week after — six more releases, two meetups, the London practitioner who flew in to see the work in person, and the question that followed me home. The releases are public on github.com/proffesor-for-testing/agentic-qe. Nagual-QE is open-source at github.com/proffesor-for-testing. Cognitum is at cognitum.one.

Dragan Spiridonov is the Founder of Quantum Quality Engineering, an Agentic Quality Engineer, Secretary of the Agentics Foundation Board, chair of the Agentic Engineering Training Committee, and one of the AI Chapter leads for the Ministry of Testing. He is currently building the Serbian Agentic Foundation Chapter in partnership with StartIt centers across Serbia.

V3 Journey Classical QE Team Transfer Stacked Bugs LLM Router Devil's Advocate Cognitum PACT Framework

Stay Sharp in the Forge

Weekly insights on Agentic QE, implementation stories, and honest takes on quality in the AI age.

Weekly on Sundays. Unsubscribe anytime.