The Tester's Journey: From Chat to Conductor

There's a question I've been wrestling with for the past year, one that kept me up late into the night as I watched my agents execute tasks, occasionally lie about their progress, and sometimes surprise me with insights I hadn't considered: Does AI bring forward "The Golden Age of QA"?

After 29 years in IT, 12 of them dedicated to quality engineering, I thought I understood what made a good tester. Critical thinking. Skepticism. Systems thinking. The ability to ask the question nobody else thought to ask. The courage to say "not yet" when everyone else wanted to ship.

Then AI arrived, and everything changed. But not in the way most people expected.

The Paradox of Infinite Velocity

Let me start with a confession: I was initially skeptical. After building QA/QE functions from scratch at Alchemy, after years of advocating for context-driven testing and the Holistic Testing Model, I wondered if these AI tools were just another silver bullet—vendor hype dressed up in new clothes.

I was wrong. But not in the way you might think.

Here's what I discovered: for years, the industry obsessed over developer velocity. We measured story points, deployment frequency, and lines of code. We optimized pipelines and automated as much as we could. And now AI has made velocity nearly infinite.

But here's the paradox: speed without trust is just a faster way to break things.

The code creation bottleneck isn't the problem anymore. The new bottleneck is believing—believing that the software works, is safe, and delivers value. And that, my friends, is what we've always done. We are the stewards of trust.

The core shift: The challenge has moved from "can we build it?" to "should we ship it?" That's a classical software testing question. Our skills haven't become obsolete—they've become the most valuable thing in the room.

Three Steps on the Ladder

My journey through this transformation occurred in three distinct phases, each building on the last and teaching me something new about the relationship between human expertise and artificial intelligence. Think of it as a ladder, where each rung requires mastery of the one below.

Step One: The Art of the Ask

I started where everyone starts—with chat. ChatGPT, Claude, Gemini. Typing questions into a box and hoping for helpful answers.

My first attempts were embarrassingly naive. "Generate UI tests for this scenario." The results? It could be 20-30% useful. Generic. Missing context. The kind of output that screams, "I don't know your project."

Then I learned the first lesson: give the AI a role.

"Act as a senior software developer in test with 10 years of experience in context-driven testing. Generate UI tests..." The results improved dramatically. Not perfect, but substantially better.

But the real breakthrough came when I stopped thinking of the AI as a single entity and started treating it as a team.

"Act as a team of QA experts, including a performance tester, a security analyst, and a usability specialist. Debate the risks in this feature and propose a test strategy that addresses each perspective."

When you run a single prompt, the transformer creates a single vector to find the nearest neighbors in its knowledge space. When you say "act as a team of experts," it establishes multiple vectors, observing the same problem from different angles. The dialogue emerges from within the model itself, yielding far better results.

I spent months refining what I call a "warm-up prompt"—a complex instruction set that tells the AI exactly who I am, what I value, and how I want it to think. It references Cem Kaner, James Bach, and Michael Bolton. It mentions ISO 25010 quality characteristics and heuristics such as SFDIPOT and FEW HICCUPPS. It asks for context-aware, risk-focused analysis.

Lesson from Step One: Prompt engineering isn't about tricking the AI. It's about clearly communicating who you are and what you need. The same skills that make you a good collaborator with humans make you effective with AI.

Step Two: Grounding the AI in Your Reality

Chat-based LLMs have a fundamental limitation: they lack project context. They don't know your codebase, your architecture, your team's conventions, or your technical debt. This leads to hallucinations—confident-sounding nonsense that doesn't match your reality.

The next level is to use coding assistants within your IDE, such as VS Code, IntelliJ, or standalone tools. I started with Cline, switched to RooCode, and eventually settled on Claude Code. The tool matters less than the concept.

The breakthrough is project context grounding. These tools can read your entire codebase. They understand your architecture. They can see your existing tests and match your patterns.

I learned to create "golden documentation"—exemplary tests, clear architectural decisions, explicit quality expectations. The AI follows examples better than instructions. Show it one well-written test, and it generates dozens that match your style.

But here's the catch: even with perfect grounding, you're still working in a single-threaded manner: one agent, one task, one conversation. The agent might be brilliant, but it's still limited by serial execution.

Lesson from Step Two: Context is everything. The same AI that produces mediocre results in chat produces excellent results when grounded in your specific codebase. This mirrors what we know about testing—context determines practice.

Step Three: The Orchestra

Working with a single agent is like playing a solo instrument. Powerful, but limited. The final level is conducting an orchestra.

This is where things get strange. And wonderful. And occasionally terrifying.

I built my own Agentic QE Fleet—an extension of the Claude Flow orchestration system with agents specialized for different quality engineering tasks. Performance testing. Security analysis. Accessibility auditing. Requirements validation. Test generation.

Picture my terminal: fifteen agents running in parallel. A backend developer installing a new database component. A coder updating base agents. A tester migrating test suites. A reviewer verifying completeness. A security scanner running final audits. A quality gate validating release readiness.

This is where it gets meta: I now use my agents to test my other agents.

The Agentic QE Fleet tests the Sentinel project. It tests the LionAGI-QE-Fleet. It even tests itself—an ouroboros of quality assurance, agents verifying agents verifying agents.

Lesson from Step Three: Orchestration multiplies your capacity, but it demands a new skill—the ability to coordinate, verify, and govern autonomous agents. You become less of a doer and more of a conductor.

The Numbers Don't Lie (But They Do Surprise)

Let me share some real-world results, because theory without practice is just philosophy:

Test strategy and plan writing:
From 10 hours to ~45 minutes. I dump my brain into the AI, dictate my thoughts, and it structures them into a coherent document.

Researching new tools:
From 20 hours to ~60 minutes. Deep research modes can traverse the internet, synthesize findings, and produce comprehensive reports.

Generating a full API test suite:
From weeks to ~3 days. Export test scenarios, provide one manually-written test as example, let agents generate the rest.

The Sentinel project:
60 hours achieved 60% completion. Switched to agentic workflow: less than 15 hours to reach 100%.

Agentic QE Fleet:
Built version 1 from scratch in under 100 hours. What would have taken a team of 6-8 people roughly 6 months.

These aren't theoretical projections. These are real projects, now running in production or available as open source.

The 4Cs: Superpowers for the Agentic Age

So we return to the question: Is this the Golden Age of QA?

Yes. But not because AI will do our jobs for us. Not because we can sit back and let the agents handle everything.

It's the Golden Age because the skills of a QA/QE mindset have never been more crucial:

Critical Thinking

Agents lie. Frequently. They'll claim a task is 100% complete when the code is full of stubs and TODOs. Your ability to question, verify, and validate is essential. Don't trust—verify.

Creativity

Coming up with the test that nobody expected. Imagining failure modes that specifications never mentioned. Designing experiments that reveal hidden assumptions. Agents execute; humans imagine.

Communication

Explaining what you need to an AI requires clarity. Explaining AI's findings to stakeholders requires translation. Documenting decisions for future engineers requires precision.

Collaboration

Working with agents is collaboration. They have strengths and weaknesses, just like human teammates. Understanding when to trust them, when to override them—that's teamwork.

The Role Shift

Our role is shifting from Creator/Executor to Verifier/Orchestrator.

We used to spend most of our time creating test cases, executing test scripts, and filing bug reports. The tedious work. The necessary work. The work scaled linearly with project size.

Now we spend more time verifying AI-generated output, orchestrating fleets of specialized agents, and making judgment calls that require human context and values.

Verification and serious testing are more important than creation now. Anyone can generate code. Anyone can spawn a hundred agents. But ensuring that the code works, that it's safe, that it delivers value—that requires the mindset we've spent our careers developing.

A Story of Student and Teacher

During my guest lecture at the University of Aveiro, something incredible happened. I was showing benchmark results from my Agentic QE Fleet—a table comparing different storage backends. A student raised his hand.

"The throughput for root vector shows more tasks per second, but the runtime is longer. What happened?"

I paused. Looked at the data. He was right. I had missed an inconsistency in my own presentation.

"Good observation! Nice catch. This is the critical eye we need."

That moment captured everything I believe about the future. The human noticed what my agents had not. The human asked the question that made me reconsider. The ensemble of eyes—multiple perspectives examining the same problem—catches what individual biases miss.

That's what we bring. That's what we've always brought. And that's what AI cannot replace.

Begin Your Journey

If you're reading this and wondering where to start, here's my advice:

Start with personalization. Before anything else, set up your LLM with who you are, what you care about, and how you want it to respond. This single step transforms generic tools into personalized assistants.
Master the warm-up prompt. Develop a complex instruction set that establishes context before each session. Reference the methodologies you practice, the heuristics you value, the quality attributes you prioritize.
Ground your agents in your codebase. Create golden documentation. Show examples, not just instructions. Let the AI learn your patterns by seeing them in action.
Embrace orchestration gradually. Start with a single specialized agent. Add a second for verification. Build your fleet over time, learning what works in your context.
Never stop questioning. The agents will claim to be finished. The agents will assert confidence. Your job is to ask: "Is this actually true? How do we know? What are we missing?"

The Forge Continues

I started my career teaching computer science and found myself back in the classroom, guest lecturing about concepts that didn't exist two years ago. The loop closes and opens again.

The forge where quality is hammered out now includes new tools—AI-powered flames that burn hotter and faster than before. But the smith's hand still guides the work. The smith's eye still judges the result. The smith's judgment still determines what's ready and what needs more time at the anvil.

This is the Golden Age of QA. Not because our work is easier, but because it matters more than ever. Not because AI replaces us, but because AI demands the best of what we bring.

The question was never "Will AI replace testers?" The question is: "Will you become the kind of tester who knows how to wield these new tools?"

The forge is hot. The hammer is ready. What will you create?

The journey from prompt engineering to context engineering to agentic engineering is not about learning new tools—it's about discovering that the skills you've always had are exactly what this new world needs. Start where you are. Use what you have. Do what you can. The agents are waiting for your guidance.

About the Author: Dragan Spiridonov (Profa) is the Founder of Quantum Quality Engineering and an Agentic Quality Engineer with 29+ years in IT and 12+ years specializing in quality engineering. He previously served as VP of Quality Engineering at Alchemy for 8 years before starting his consultancy. He's establishing the Serbian Agentic Foundation Chapter and is a member of the Global Agentics Foundation.

All AI artwork in the original presentation was generated using Nano Banana.

Related from The Quality Forge:

Why the Agentic QE Framework Might Transform Your Quality Engineering

A pragmatic guide to understanding if autonomous quality engineering fits your context.

The Five-Release Journey Where I Forgot to Be a Tester

Eight brutal lessons learned from forgetting to verify what I already knew how to test.

Join the Discussion

Ready to start your own journey from chat to conductor? Let's talk about building your first agent fleet.

Join a Meetup Email Me