What is Agentic QE? (And Why PACT Matters)

Moving from testing-as-activity to agents-as-orchestrators. How PACT principles bridge classical QE with autonomous testing systems, without the vendor hype.

The Orchestra Nobody Wanted to Conduct

Picture this: You're VP of Quality Engineering. It's 3 AM. Production is down. Again. The monitoring system caught it—eventually. The automated tests passed—technically. But your customers? They've been seeing errors for 47 minutes before anyone on your team knew.

You have dozens of tools, hundreds of tests, thousands of metrics. But somehow, the coordination between them is still happening in Slack messages and morning stand-ups. Your testing is reactive. Your quality is checked, not built. Your team is drowning in test maintenance while real bugs slip through.

Sound familiar? This was me at Alchemy, eight years into building quality systems, and I realized: we weren't missing better tools—we were missing better orchestration.

"Without orchestration, testing is theater. We perform quality, but we don't deliver it."

What Agentic QE Actually Means

Let me be direct: Agentic QE is not about replacing testers with AI. That's vendor hype, and it misses the point entirely.

Agentic Quality Engineering is the evolution from testing-as-activity to agents-as-orchestrators. It's about moving from:

→ Manual test execution to autonomous test flows
→ Reactive bug finding to proactive risk anticipation
→ Siloed testing phases to collaborative quality ecosystems
→ Coverage metrics to risk-targeted intelligence

But here's the critical part: agents augment expertise, they don't replace it. Every autonomous decision needs a reason. Every test flow needs explainability. Every agent action requires human checkpoints at critical junctures.

That's where PACT comes in.

The PACT Framework: Making Sense of Agentic Systems

PACT stands for Proactive, Autonomous, Collaborative, Targeted. It's not just an acronym—it's a classification system for understanding and building agentic quality systems, similar to how SAE levels classify autonomous vehicles.

Let me break down each principle with real examples from building agentic-qe—a distributed fleet of 16 specialized AI agents for comprehensive software testing. This is production code, not theory.

Proactive: Anticipate, Don't Just React

Traditional testing is reactive. You write code, then test it. You deploy, then monitor. You find bugs, then fix them.

Proactive agents anticipate problems before they become problems.

Real Example: Testing the Testing System

When I first implemented QE agents in the agentic-qe project using Claude Flow, I had a meta moment: why not have the agents test themselves?

So I initialized the QE agent fleet within the agentic-qe project itself and asked them to analyze the codebase and find problems. Here's what happened:

• Security issues: Found hardcoded credentials and insecure API endpoints
• Memory leaks: Detected improper cleanup in agent lifecycle management
• Production anti-patterns: Caught mocks being used in production code

The agents found these issues before I even ran the code. That's proactive quality—catching problems in the analysis phase, not the deployment phase.

But here's the honest part: the first time the agents ran this self-analysis, they over-estimated severity levels. They gave a false sense of urgency for some issues, treating minor style problems as critical bugs.

This led to an important lesson: agents need calibration. I went back and improved how they do risk and severity assessments. Proactive doesn't mean perfect—it means learning and improving the anticipation mechanisms.

Autonomous: Self-Executing with Explainability

Autonomous doesn't mean "set it and forget it." It means agents can make decisions and take actions without constant human intervention—but they must explain their reasoning.

The Reality of Autonomous Agents:

Working solo on agentic-qe for months, I used Claude Code in my VS Code terminal and Claude Flow for orchestration. The agents would work in swarms—4 to 8 agents running in parallel with shared memory.

When guided properly, they provided code that needed less rework. We could move much faster because multiple agents tackled different aspects simultaneously. For complex tasks, I'd prompt them to use "hive-mind" coordination for better overview.

But autonomous doesn't mean infallible:

• Agents would lose track, especially during Anthropic model overloads or API errors
• They'd over-complicate solutions that could be done more simply
• They'd over-engineer when a straightforward approach would work

My solution? Regular commits before starting new batches of changes. This let me revert fast when agents went down the wrong path. Autonomy requires safety nets.

The Explainability Requirement:

Every agent action must come with a reasoning trace. If an agent decides to skip a test, expand coverage, or flag a risk, it must explain what it decided, why it decided, what data informed the decision, and what confidence level it has. Without this, you have a black box making quality decisions about your production systems. That's not engineering—that's hope disguised as automation.

Collaborative: Agents + Humans + Systems

Quality has never been a solo activity. Agentic QE takes this to the next level: agents collaborate with other agents, with human experts, and with existing systems.

In the agentic-qe project, I built a fleet of 16 specialized agents. Think of it like a testing orchestra where each agent has a specific role:

The Real Agent Fleet:

Core Testing Agents:

• test-generator: AI-powered test creation with property-based testing
• test-executor: Multi-framework execution with parallel processing
• coverage-analyzer: Real-time gap analysis with O(log n) algorithms
• quality-gate: ML-driven validation and risk assessment
• quality-analyzer: ESLint, SonarQube, Lighthouse integration

Performance & Security:

• performance-tester: Load testing with k6, JMeter, Gatling
• security-scanner: SAST, DAST, dependency scanning

Strategy & Intelligence:

• requirements-validator: Testability analysis, BDD generation
• production-intelligence: Incident replay, anomaly detection
• fleet-commander: Hierarchical coordination of 50+ agents

Advanced Testing:

• regression-risk-analyzer: Smart test selection using ML patterns
• test-data-architect: Generates 10k+ realistic records/second
• api-contract-validator: Breaking change detection (OpenAPI, GraphQL)
• flaky-test-hunter: Statistical detection with auto-fix

Deployment & Resilience:

• deployment-readiness: Multi-factor release validation
• visual-tester: AI-powered UI regression testing
• chaos-engineer: Fault injection for resilience testing

Each agent has specialized capabilities. Together, they cover the full quality landscape. The human's role shifts from executing every test to conducting the orchestra—setting direction, handling exceptions, making ethical calls.

Here's the reality: as both codebase and test code grew rapidly with agent help, manual review became nearly impossible. I needed the agents to provide fast feedback from multiple perspectives. They still needed guidance—they're amplifying the one using them, not replacing judgment.

Targeted: Focus Where It Matters Most

Not all code is equally important. Not all bugs are equally costly. Agentic QE uses risk-based intelligence to target effort where it matters most.

The Iteration Story:

Building agentic-qe wasn't a straight path. I created five versions of the project. I ditched the first four.

Why? Because the agents helped me iterate fast enough to realize when an approach wasn't working. In traditional development, the sunk cost fallacy might have kept me stuck on version 2 or 3 for months.

With agents, I could:

• Prototype new architectures in days instead of weeks
• Test ideas with real code, not just diagrams
• Pivot quickly when design flaws became apparent
• Learn from each iteration and apply it to the next

This is targeted development: focusing engineering effort on finding the right solution, not defending the first solution. It took months, but it was just me and the agents—no team overhead, no coordination delay, just rapid iteration cycles.

The fifth version? That's the one that's live on GitHub today. It's not perfect—version 6 will improve it. But it's working, and that's because I had the freedom to fail fast and learn faster.

Bridging Classical QE to Agentic QE

Here's what I want to be crystal clear about: classical QE practices don't disappear in Agentic QE—they evolve.

Classical Practice	Agentic Evolution
Manual exploratory testing	flaky-test-hunter + chaos-engineer exploring autonomously
TDD (Test-Driven Development)	test-generator with property-based suggestion
CI/CD pipelines	test-executor with parallel orchestration
Production monitoring	production-intelligence with anomaly detection
Risk-based testing	regression-risk-analyzer with ML-driven prioritization

The foundation remains. TDD is still about fast feedback and design. Exploratory testing is still about discovering unknowns. But now, agents amplify these practices, making them faster, more comprehensive, and more intelligent.

The Reality Check: What Doesn't Work Yet

Let me be honest about the limitations, because you won't hear this from vendors:

Current Limitations of Agentic QE (From Production Experience):

• Context understanding is still weak - Agents struggle with business rules that require deep domain knowledge
• Model reliability varies - Anthropic API overloads and errors cause agents to lose track
• Over-complication is common - Agents tend to over-engineer when simpler solutions exist
• Severity calibration is hard - Initial risk assessments can be wildly inaccurate
• Maintenance overhead is real - Agents need training, monitoring, and frequent correction
• Speed ≠ Quality without guidance - Fast iteration requires human strategic direction

We're in the early days. This isn't magic. It's engineering—which means trade-offs, iterations, and honest assessment of what works versus what's still aspirational.

Starting Your Agentic QE Journey

If you're thinking "okay, this makes sense, but where do I start?"—here's what I recommend based on building agentic-qe from scratch:

1. Start with Self-Analysis

Have agents analyze your existing codebase first. Let them find low-hanging fruit: security issues, memory leaks, anti-patterns. You'll learn how they think and where they struggle.

2. Work in Small Batches with Frequent Commits

Don't let agents run wild for hours. Make commits before each major change. When they go off track, you can revert fast. Autonomy requires safety nets.

3. Calibrate Severity Assessment Early

Expect agents to over-estimate urgency initially. Build feedback loops to teach them the difference between critical bugs and minor style issues. This takes time but pays off.

4. Embrace the Iteration Mindset

Be willing to throw away versions. I ditched four complete versions of agentic-qe before finding the right architecture. Agents make iteration cheap—take advantage of it.

The Path Forward

Agentic QE isn't a destination—it's an evolution. You don't flip a switch and suddenly have autonomous testing agents handling everything. You start small, measure carefully, learn continuously.

PACT gives you a framework to think about this evolution systematically. It helps you classify what kind of autonomy you're building, what collaboration looks like, how proactive versus reactive your systems should be, and where to target effort.

The agentic-qe project is open source on GitHub. It's a real implementation with real limitations. Use it, break it, improve it. That's how we learn together.

Remember This:

Quality has never been about tools. It's never been about automation for automation's sake. It's always been about delivering value to customers while managing risk intelligently.

Agentic QE just gives us better orchestration to do that. The agents are instruments. You're still the conductor. And sometimes, the conductor needs to tell the instruments to play simpler.

Join the Conversation

This is article one in the launch series for The Quality Forge. I'm building the Serbian Agentic Foundation—the first Agentic QE community in the Balkans—and sharing everything I learn along the way.

What's your take on Agentic QE? Where do you see PACT principles fitting (or not fitting) in your testing workflows? Have you tried working with AI agents? What surprised you?

Connect with me on LinkedIn or join our Serbian Agentic Foundation meetups. First one is October 28, 2025 at StartIt Center Novi Sad.

Get weekly insights on Agentic QE straight to your inbox:

Join The Forge Newsletter