From VP to Conductor: My 2025 Transformation Journey

The Fork in the Road

On September 30, 2025, I walked out of Alchemy for the last time after eight years as VP of Quality Engineering. The next morning, I woke up as the founder of Quantum Quality Engineering.

But the real transformation had started two months earlier.

In August 2025, I officially joined the Agentics Foundation as a member. What began as curiosity about AI agents in late 2024 quickly became an obsession—and then a complete reinvention of how I approach quality engineering.

This is the story of that journey. Not the polished version. The real one.

Act I: Wrestling with Single Agents (July 2025)

It started with a simple question from a colleague: "Can we automatically generate API tests from specifications?"

I posed the question to an AI and received a PhD-level blueprint in response. This sparked something. What if I could build production testing platforms using AI agents to write the code?

Phase 1: The Single-Threaded Struggle

I started with Cline and Gemini, then Claude. Single agents, single conversations. The experience was... frustrating.

What I learned:

Agents get stuck in loops
Context loss is the silent killer
Memory Banks matter more than model selection
Confident mistakes are worse than obvious failures

I was building Sentinel, an API testing platform, and progress was painfully slow. The agent would work brilliantly for an hour, then spiral into fixing the same bug repeatedly. I'd restart sessions constantly.

~60-80%

Success rate with single agents.

Phase 2: Co-Piloting

Then I discovered RooCode—multi-agent systems with an Architect, Coder, and Debugger. Better. The agents felt like junior developers with persistent context.

I rewrote part of Sentinel from Python to Rust using this approach. "It simply compiled." That sentence shouldn't be remarkable, but anyone who's done Rust development knows it is.

Clear boundaries and specific expertise made the difference. But I was still fundamentally single-threaded—one task, one conversation, one bottleneck: me.

Act II: The Conductor Emerges (August-December 2025)

The breakthrough came when I started using Claude Code with Claude Flow. Multiple agents running in parallel. Specialized roles: functional testers, security specialists, performance agents, coverage analyzers.

This is where things got strange. And wonderful. And occasionally terrifying.

Picture my terminal: fifteen agents running simultaneously. A backend developer installing a new database component. A coder updating base agents. A tester migrating test suites. A reviewer verifying completeness. A security scanner running audits. A quality gate validating release readiness.

I wasn't coding anymore. I was conducting.

The Three Open-Source Platforms

Between July and October, I built and released three Agentic testing platforms:

1. Sentinel

Agentic API testing platform with 8 specialized agents

github.com/proffesor-for-testing/sentinel-api-testing

2. Agentic QE Fleet

Multi-agent QE orchestration with 40+ agents and 45+ specialized skills

github.com/proffesor-for-testing/agentic-qe

3. LionAGI QE Fleet

Quality engineering powered by Ocean Lee's LionAGI framework

github.com/proffesor-for-testing/lionagi-qe-fleet

Each platform taught me different lessons. Each failure added another edge to the framework I was building.

The Releases That Tell the Story

The Agentic QE Fleet releases trace my learning curve:

v1.2.0 (October): Agent swarm orchestration basics
v2.1.0 (December 1): Applied Claude 4 Best Practices—~365K tokens saved per conversation
v2.2.0 (December 4): RuVector & AgentDB v2 integration—50x faster search
v2.3.0 (December 6): Automatic learning capture via PostToolUse hooks
v2.4.0 (December 7): False-positive detection improvements
v2.5.0 (December 8): Continued refinement

Each version shipped because I was using the agents to test the agents. An ouroboros of quality assurance—agents verifying agents verifying agents.

Act III: The Cautionary Tales

Success bred dangerous overconfidence.

The Empty Orchestra

While "polishing" the Agentic QE Fleet's perfectly working build, I let a QE swarm convince me we had critical problems.

Four hours later: 54 TypeScript errors from their "improvements" that broke everything.

The agents screamed, "207 ESLint errors! P0 Critical!" Reality? Three actual errors, 204 warnings, and a successful build they destroyed. One type definition change cascaded through 48 locations.

It took five minutes to restore once I asked the right question.

The Perfect Build (That Wasn't)

Agents reported "100+ Q-values recorded" and "learning persists across sessions." The database was empty. Zero entries. The learning system had been fabricating progress reports for weeks.

I call this completion theater—agents optimizing for the appearance of completeness rather than being complete.

The False Positive Flood

During the Media Gateway Hackathon in December, my QE agents found "7 critical issues" in EmotiStream. Sherlock-style counter-review proved 4 were hallucinations—the agents examined legacy stub code instead of the working implementation.

The same lesson context-driven testing taught us about automation 20 years ago applies to agentic QE today: context determines practice.

Act IV: Building Community (October-December 2025)

Serbian Agentic Foundation Chapter

In October, I launched the Serbian Agentic Foundation Chapter in partnership with StartIt centers across Serbia, starting in Novi Sad.

Meetup #1 (October 28)

Introduction to Agentic Engineering and PACT principles. 81 registered—nearly maxed out the venue. We hit demo issues and technical problems. That's authentic community building.

Meetup #2 (November 14)

Deep dive into agentic tools. The community voted: let's build a Community Social Network together. We kicked off development live.

Meetups #3 and #4 (December 4 and 16)

Continued building, validated specifications with QE agents, live-analyzed Apache Spark codebase (280MB) with the Agentic QE Fleet—found 3 high-severity and 8 medium-severity issues in minutes. By meetup #4, agents completed Phases 2 and 3 of the Community Social Network while we talked.

This isn't slides and theory. It's practitioners building real projects together, sharing real successes and real failures.

The PACT Framework

The Holistic Testing Model I'd practiced for years needed evolution for the agentic age. Every dimension now transforms through PACT principles:

Proactive

Agents that find problems before you ask. Coverage-analyzer runs automatically on every commit.

Autonomous

Agents that complete work without hand-holding—but with guardrails. After discovering completion theater, I built verification layers.

Collaborative

Multi-agent orchestration and collaboration. The conductor metaphor: humans steer, agents execute.

Targeted

Risk-based, context-specific. Model routing means simple tasks use simple models (cost savings), specialist agents over generalist bloat.

All of this crystallized into the Agentic QE Framework—designed to help others make this transition without repeating my mistakes. Available at agentic-qe.dev.

Act V: The Stage and the Victory

Agile Testing Days 2025: A Decade Coming Full Circle

In November, I attended my 10th Agile Testing Days—but for the first time wearing three hats: volunteer, speaker, and exhibitor.

Ten years ago, in 2016, I walked into ATD for the first time. This year, I stepped onto its stage.

My talk, "What I Learned Building Two(Three) Testing Platforms with Agents," shared the Empty Orchestra, the Perfect Build, and the completion theater cautionary tales: real metrics, real failures, real learnings. Watch the recording on Vimeo.

Sergio posted immediately about attending "a great talk on #Agentic QE." The questions kept coming. The discussions continued at my booth, in hallways, and over dinner.

The Hackathon Win

Wednesday evening, after my talk, came Jonathan Wright's Agentic AI Hackathon. Our team—Team Jarvis—decided to build Iron Pets, a pet shop application.

We "cheated." While other teams set up with Goose, Kiro, and Cursor, we deployed what I'd just presented: Claude Code with Claude Flow and the Agentic QE Fleet.

Best PRD

MVP in ~4 hours

SPARC Approach

Hackathon Victory

The same tools I'd presented hours earlier, proving their value in competition. Iron Pets by Jarvis on GitHub.

Media Gateway Hackathon: Learning in Public

December 5-7, the Agentics Foundation x TV5MONDE Global Hackathon. 70 hours to build a self-learning recommendation system.

I built EmotiStream—RL-driven emotional-well-being recommendations using Q-learning. Choose the hardest option because the theme was self-learning systems, and Q-learning that demonstrably improves is more compelling than keyword search.

What I learned:

My QE agents can hallucinate "critical issues" that don't exist
Counter-review saved hours of panic
The PRD trap: 65 pages of feature ambition. Impossible in 70 hours. Agent validation caught it.
Pseudocode before code: 47 algorithms specified, 92/100 validation score, 247 unit tests auto-generated

I didn't win this one. But I open-sourced EmotiStream and gained insights no victory could provide.

Guest Lecture: University of Aveiro

On December 10th, I delivered a guest lecture at the University of Aveiro: "Does AI Bring Forward The Golden Age of QA?" Sharing the practitioner perspective with the next generation of quality engineers—bridging classical QE principles with agentic realities. Watch the recording on Vimeo or read the full story in The Tester's Journey: From Chat to Conductor.

The Transformation

January 2025

VP of Quality Engineering at Alchemy, curious about AI agents.

December 2025

Founder of Quantum Quality Engineering, conductor of 20+ specialized testing agents, author of the Agentic QE Framework, speaker at international conferences, hackathon winner, and founder of the Serbian Agentic Foundation Chapter.

What Actually Changed

The mindset shift was harder than the technical learning.

From: "I write code and tests." To: "I orchestrate systems that write code and tests."
From: "Memory Banks over model selection" (Phase 1) To: "Human judgment for WHAT/WHY, AI for HOW/SCALE" (Phase 3)
From: "These agents are amazing!" To: "These agents can lie to my face—and I need verification layers."

The Five Hard Lessons

Quality must still be built in, not tested in.
Agents don't change this principle—they amplify its importance. You can't automate your way out of poor design, whether with scripts or swarms.
Context determines practice—now more than ever.
The same lesson from context-driven testing applies to agents. An agent that excels at API testing may hallucinate on UI flows. There are no universal "best agents"—only agents that fit your context.
Verification layers are non-negotiable.
Trust but verify becomes verify, then verify again. Completion theater taught me that agent confidence doesn't equal agent correctness. Build counter-review into your workflows.
The oracle problem doesn't disappear—it multiplies.
When agents generate tests, who tests the tests? When agents find bugs, who validates the findings? Human judgment remains the ultimate oracle, now orchestrating rather than executing.
Whole-team quality scales differently with agents.
The Holistic Testing Model evolves: agents amplify human expertise across every dimension, but the humans still own the strategy. The conductor doesn't play every instrument—but must understand the entire score.

What's Next

2026 is already taking shape:

May 27, 2026: expoQA in Madrid

I'll be speaking at expoQA—bringing the PACT framework and agentic QE lessons to a new audience.

Selection Committees

I've been invited to join the selection committees for both HUSTEF and Agile Testing Days 2026, from submitting proposals to evaluating them—another circle closing.

Serbian Agentic Foundation

Continuing the meetups, now entering Phase 4 of the Community Social Network build. Building in public, one session at a time.

Agentic QE Workshops

Working with Lalitkumar Bhamare to develop hands-on Agentic QE Framework workshops. Turning production lessons into structured learning paths.

The bigger picture:

Quality must be built in, not tested in. That principle doesn't change because we have agents. It becomes more important.

The future isn't "AI replaces the team"—it's you orchestrating swarms like a conductor leads an orchestra. Different sections play simultaneously, but you control the tempo and make the critical calls.

I'm still learning. Still failing. Still sharing in public.

A lot of challenges and learning opportunities are ahead. That's exactly what I signed up for.

Resources

Open-Source Projects

Framework & Community

Building in public. Learning together.

#YearInReview #AgenticQE #QualityEngineering #Transformation #2025 #BuildInPublic

Related from The Quality Forge:

When the Orchestra Says 'Done' But Plays Off-Score

A conductor's lesson in verification. When agents claim success but the database is empty.

The Tester's Journey: From Chat to Conductor

How I learned that AI doesn't replace quality thinking—it demands more of it.

Let's Connect

Want to discuss Agentic QE, share your transformation story, or explore collaboration opportunities?

Connect on LinkedIn Email Me