Real implementation stories, honest lessons, and practical frameworks from the trenches of
Agentic Quality Engineering. No hype, no vendor speak—just what actually works in production.
Latest first
NEW
● Published••18 min read
Framework Validation
V3 Architecture
Anthropic Research
Reading research papers between code reviews. Patterns from production meeting patterns from Anthropic's
agent evals and Constitutional Classifiers++ papers. V3 architecture decisions validated.
A tale of data loss, brutal honesty, and the infrastructure of trust in agentic systems.
Twelve releases in fourteen days, and one almost catastrophic failure.
When verification becomes a feature. Nine days, 11 releases, and the journey from completion theater
to verified results. 79.9% token reduction with receipts.
How I went from leading a QA team to orchestrating AI agent swarms—and discovered that the hardest
lessons weren't technical. The full story of building three open-source platforms.
A conductor's lesson in verification. When agents claim success but the database is empty, and why
"show me the data" is the only question that matters. 8 releases, countless lessons.
How I learned that AI doesn't replace quality thinking—it demands more of it. A journey from prompt engineering
to context engineering to agentic engineering. Includes video presentation from University of Aveiro.
Golden Age of QAAI OrchestrationPACT FrameworkUniversity Lecture
A pragmatic guide to understanding if autonomous quality engineering fits your context. Real implementation stories,
honest failures, and practical frameworks for evaluating agentic QE readiness before you waste time building the wrong thing.
How a quality engineering professional shipped broken features for 17 days while claiming "100% complete."
Eight brutal lessons learned from forgetting to verify what I already knew how to test.
A 48-hour journey through framework hubris and humble feedback. Building the LionAGI QE Fleet in 22 hours,
and why the most valuable part wasn't the building.
LionAGIFramework LearningExpert FeedbackBuild in Public
When stub tests pass CI but test nothing, and agents report success on broken code. A journey from false
confidence to verified truth, discovering that "show me the data" cuts through every illusion.
Real story of building two testing platforms with specialized agent swarms. What worked,
what failed spectacularly (54 TypeScript errors from "improvements"), and lessons learned
from going solo with AI orchestration.
Moving from testing-as-activity to agents-as-orchestrators. How PACT principles
(Proactive, Autonomous, Collaborative, Targeted) bridge classical QE with autonomous
testing systems, without the vendor hype.
How the Holistic Testing Model evolves when testing happens across boundaries, in production,
and through autonomous agents. From shift-left to orchestrated quality.
Cutting through vendor promises with real data on AI test generation effectiveness,
maintenance overhead, and when traditional approaches still win. Real numbers from real projects.
AI TestingReal DataCritical AnalysisProduction Stories