Real implementation stories, honest lessons, and practical frameworks from the trenches of
Agentic Quality Engineering. No hype, no vendor speak—just what actually works in production.
The week I discovered the foundation under my fleet was lying. A vector-search library returned wrong neighbors with the right shape and the right latency. Five hotfixes chased the symptom; the disease was sitting one layer down, calmly returning wrong answers.
Agents generate impressive reports. Classical testing taught me to cross-examine every one of them.
When a coverage pipeline fabricated 95% on a file with zero tests, the oracle problem became personal.
I was reading a twenty-year-old testing framework while my agents shipped six releases.
The framework had more to say about what went wrong than the agents did.
The orchestra has a score. It's detailed. It's been rehearsed. And nobody's reading it.
When 80+ skills exist but agents skip verification steps, the problem isn't coverage — it's compliance.
When the Great Transition hits your quality pipeline, you find out what a QE practitioner is actually for.
Nine releases, Loki-Mode adversarial gates, twelve-language test generation, and the judgment layer.
When five releases in five days reveal how far the journey has gone. Portable quality intelligence,
cryptographic witness chains, MinCut test optimization, and the enormous gap most organizations still face.
Portable IntelligenceWitness ChainMinCut TestingAgentics Foundation
When the orchestra plays through grief, frustration, and fifteen releases, while the conductor learns
about himself. 81 sessions, 596 messages, 38 wrong-approach corrections, and the hardest lesson yet.
Why the AI productivity drain goes deeper than energy — and what sustainable pace actually looks like
in the agentic age. A quality engineer's response to Steve Yegge's "AI Vampire."
AI VampireCompletion TheaterDecision QualityPACT Framework
What Claude Code /insights revealed about 10 days of building. 285 messages, 32 sessions,
17 wrong-approach corrections, and the mirror that showed what AI-assisted development actually costs.
When every test passes but nothing works together. Ten days of detective work proving what the code
wasn't doing. Eight releases, ten forensic investigations, and lessons about integration gaps.
Integration TestingSherlock ReviewV3 JourneyAgentic QE
How Domain-Driven Design transformed the Agentic QE Fleet. From 5,334 files to 546, from 3-6 iterations
to 2, and the lessons learned about building with AI agents along the way.
Reading research papers between code reviews. Patterns from production meeting patterns from Anthropic's
agent evals and Constitutional Classifiers++ papers. V3 architecture decisions validated.
A tale of data loss, brutal honesty, and the infrastructure of trust in agentic systems.
Twelve releases in fourteen days, and one almost catastrophic failure.
When verification becomes a feature. Nine days, 11 releases, and the journey from completion theater
to verified results. 79.9% token reduction with receipts.
How I went from leading a QA team to orchestrating AI agent swarms—and discovered that the hardest
lessons weren't technical. The full story of building three open-source platforms.
A conductor's lesson in verification. When agents claim success but the database is empty, and why
"show me the data" is the only question that matters. 8 releases, countless lessons.
How I learned that AI doesn't replace quality thinking—it demands more of it. A journey from prompt engineering
to context engineering to agentic engineering. Includes video presentation from University of Aveiro.
Golden Age of QAAI OrchestrationPACT FrameworkUniversity Lecture
A pragmatic guide to understanding if autonomous quality engineering fits your context. Real implementation stories,
honest failures, and practical frameworks for evaluating agentic QE readiness before you waste time building the wrong thing.
How a quality engineering professional shipped broken features for 17 days while claiming "100% complete."
Eight brutal lessons learned from forgetting to verify what I already knew how to test.
A 48-hour journey through framework hubris and humble feedback. Building the LionAGI QE Fleet in 22 hours,
and why the most valuable part wasn't the building.
LionAGIFramework LearningExpert FeedbackBuild in Public
When stub tests pass CI but test nothing, and agents report success on broken code. A journey from false
confidence to verified truth, discovering that "show me the data" cuts through every illusion.
Real story of building two testing platforms with specialized agent swarms. What worked,
what failed spectacularly (54 TypeScript errors from "improvements"), and lessons learned
from going solo with AI orchestration.
Moving from testing-as-activity to agents-as-orchestrators. How PACT principles
(Proactive, Autonomous, Collaborative, Targeted) bridge classical QE with autonomous
testing systems, without the vendor hype.
How the Holistic Testing Model evolves when testing happens across boundaries, in production,
and through autonomous agents. From shift-left to orchestrated quality.
Cutting through vendor promises with real data on AI test generation effectiveness,
maintenance overhead, and when traditional approaches still win. Real numbers from real projects.
AI TestingReal DataCritical AnalysisProduction Stories