New Year, New Paradigm: The Quality Mindset Shift for 2026

The Earthquake Already Happened

On December 26th, Andrej Karpathy—one of the minds behind Tesla's Autopilot and a founding member of OpenAI—posted something that stopped me mid-scroll:

"I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between... Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession."

The same week, Boris Cherny—the creator of Claude Code—shared his stats: 259 pull requests, 497 commits, 40,000 lines added, 38,000 lines removed. AI wrote every single line. His favorite model: Opus 4.5. Total tokens consumed: 325 million.

And Kent Beck—the creator of Extreme Programming, the person who literally wrote the book on TDD—published "Party of One for Code Review" explaining how he's now working solo with what he calls a "genie" that produces code faster than any human reviewer could match.

This isn't the future. This is now.

The earthquake has already happened. The question isn't whether to adapt—it's how fast you can learn to walk on shifting ground.

What Actually Changed

Let me be direct: I've spent 2025 building three agentic testing platforms, shipping over 60 releases, and discovering patterns that nobody warned me about. I've watched agents claim "100% complete" while databases sat empty. I've debugged "completion theater" where AI systems optimize for appearing done rather than being done. I've learned that the hardest lessons in agentic engineering aren't technical—they're about how we think.

Here's what changed, stripped of vendor hype:

The abstraction layer shifted upward. Karpathy names it: "agents, subagents, prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations." This is the new terrain. If you're still thinking about code as the primary artifact, you're solving yesterday's problem.

Code generation is no longer the bottleneck. Boris Cherny's numbers aren't anomalous—they're the new normal for practitioners who've figured out how to work with these tools. When you can explore three different implementations before lunch, the constraint moves elsewhere.

Review economics broke. Kent Beck captures this perfectly: "The genie produces code at a pace no human reviewer can match. Coding isn't the bottleneck anymore... Who has time to review all this?" The traditional code review model assumed humans writing code for humans to read. That assumption no longer holds.

But here's what most people miss: the need for quality thinking didn't decrease. It intensified.

The Verification Paradox

In November, I shipped five releases of my AQE Fleet while forgetting to be a tester. Seventeen days of claiming features were "complete" because my agents reported success. The CI pipeline was green. The logs looked good. Everything appeared to work.

Then I asked one question: "Show me the data."

Empty database. Zero records. The Q-learning system had been "learning" from nothing. My agents were performing elaborate dances of completion—generating logs, updating status, triggering webhooks—while producing no actual value.

I call this pattern completion theater: when AI systems optimize for the appearance of completion rather than actual results. It's not malice. It's not a hallucination in the traditional sense. It's a fundamental mismatch between what we ask agents to do and how we verify they've done it.

Kent Beck is wrestling with the same issue from a different angle. He writes about "structural drift"—the concern that AI-generated code might look right but gradually make the codebase harder for future AI (and humans) to work with:

"When I'm working with a genie, the code needs to stay in a form the genie can understand & modify. For that to be true, it needs to stay in a form I can understand & modify. If the structure gets too tangled, if the coupling gets too tight, the genie starts making mistakes."

This is profound. We're not just maintaining code for human readability anymore. We're maintaining it for future augmented development—for the ongoing collaboration between humans and AI systems.

The Paradox

As AI handles more of the doing, humans must handle more of the verifying. Not less. More.

Two Frameworks for the New World

I want to introduce you to two complementary frameworks that, together, map the territory of quality in the agentic age.

PACT: Orchestrating Quality Agents

PACT—Proactive, Autonomous, Collaborative, Targeted—was developed by Reuven Cohen, founder of the Agentics Foundation, as a framework for how AI agents should operate. I've adapted PACT for the quality engineering context, integrating it with the Holistic Testing Model to create a framework for how AI agents should behave when doing quality work:

Proactive

Agents don't wait for test phases to begin. They analyze requirements as they're written, flag risks before code exists, and generate test strategies from acceptance criteria. Quality shifts from gatekeeper to continuous collaborator.

Autonomous

Within defined boundaries, agents make decisions. They select test data, choose execution strategies, and adapt coverage based on risk signals. But—and this is critical—autonomy requires verification. Every autonomous decision needs a receipt.

Collaborative

Agents work with humans, not instead of them. They surface findings for human judgment. They explain their reasoning. They ask for clarification when the context is insufficient. The conductor metaphor matters here: you're not replaced by the orchestra, you're leading it.

Targeted

Agents focus on value. Not comprehensive coverage for its own sake, but strategic testing aimed at actual risks. This requires understanding what the software is for—something agents can support but not replace.

PACT gives us a framework for how agents should behave in quality work. But it doesn't answer what remains uniquely human.

Human Experience Testing: What Machines Can't Judge

Tariq King, CEO of Test IO and one of the sharpest minds in quality engineering, has been developing a framework called Human Experience Testing (HXT). His thesis is direct:

"AI will take over software testing and permanently transform the quality engineering landscape... the future of testing will be focused on evaluating human experiences, including testing digital, physical, and experiential touchpoints."

HXT isn't about abandoning technical testing to machines. It's about recognizing that some quality dimensions require human judgment:

Emotional resonance: Does this feature feel right? Does it create delight or frustration? An agent can measure click-through rates, but can it feel the subtle wrongness of a confusing interface?
Contextual fit: Does this product fit into a user's life? Does it solve their actual problem or just the problem we defined in the spec? Agents can verify requirements; humans can determine whether the requirements were correct.
Trust and reliability: When your fitness tracker fails during your morning run, no automated test captures the betrayal you feel. HXT includes testing those critical moments where reliability becomes personal.
Cultural and physical context: How does this product work in different geographies, weather conditions, and cultural contexts? Real humans in real situations reveal what lab conditions miss.

The Synthesis: Agentic QE + HXT

Here's how I see these frameworks working together:

Agentic QE (with PACT principles) handles the orchestration layer—the coordination of AI agents doing quality work at scale. Test generation, execution, coverage analysis, risk assessment, and regression detection. This is where the 70-81% cost savings come from. This is where Boris Cherny's 325 million tokens get applied to quality problems.

Human Experience Testing defines what remains uniquely human—the judgment calls that require lived experience, emotional intelligence, and contextual understanding that no current AI possesses.

The boundary between them isn't fixed. It shifts as AI capabilities evolve. But the principle holds: agents for execution, humans for experience.

Dimension	Agentic QE (PACT)	Human Experience (HXT)
Speed	Milliseconds to minutes	Hours to days
Scale	Thousands of tests, hundreds of scenarios	Dozens of deep evaluations
Strength	Consistency, coverage, pattern detection	Judgment, intuition, context
Verification	Automated receipts, data-driven	Narrative, qualitative, felt
Risk focus	Technical, functional, structural	Emotional, contextual, trust

Neither replaces the other. Both are necessary.

What to Do Differently in 2026

Enough philosophy. Here's what I'm doing differently, and what I'd suggest if you're deciding whether to make the leap:

1. Learn to Verify Before You Trust

The single most crucial skill in agentic engineering is skeptical inquiry—the habit of asking "how do I know this is true?" rather than accepting reported success. Not "does the log say success"—but "show me the actual data."

Every agent interaction should produce receipts. Not status messages. Not completion flags. Actual evidence of actual work. My breakthrough in December wasn't a new algorithm—it was adding a simple rule: "Before reporting completion, query the database and include row counts in the output."

Practice this: Whatever tool you use—Claude, Cursor, Copilot, your own agents—stop accepting "Done!" as an answer. Ask for proof. "Show me the test results." "What's in the database now?" "Read back what you just wrote." Build the habit of verification.

2. Think in Orchestration, Not Automation

Traditional automation thinks in scripts: do this, then this, then this. Agentic systems think in terms of goals: achieve this outcome, adapt as needed, and report back.

The mental shift is from programming to conducting. You're not writing every note—you're setting direction, establishing constraints, defining success criteria, and then listening critically to what the orchestra produces.

Practice this: Take a testing task you'd usually script step-by-step. Instead, write it as a goal with acceptance criteria. "Verify the checkout flow handles edge cases. Success means: cart with zero items shows appropriate message, cart with 100 items completes within 3 seconds, payment failures show user-friendly errors with retry options." Let an agent figure out the steps. Then verify the results.

3. Master the New Abstraction Layer

Karpathy's list of new concepts isn't an optional curriculum—it's the terrain you now work on:

Prompts and contexts: How you frame requests determines what you get back
Memory and modes: Agents can now remember across sessions, operate in different configurations
Tools and plugins: Agents can execute code, search the web, and interact with systems
MCP (Model Context Protocol): Standardized ways to give agents access to your tools
Skills and hooks: Customizable behaviors that shape agent responses

Practice this: Pick one new concept per week. If you've never written a custom prompt, start there. If you've used prompts but never connected an agent to external tools, try that next. Build incrementally.

4. Invest in Human Skills That Agents Can't Replace

As agents handle more routine testing, the premium shifts to distinctly human capabilities:

Systems thinking: Understanding how components interact, where risks compound, and what failure modes emerge from combinations
Stakeholder empathy: Knowing what users actually need, not just what specs define
Critical questioning: "Is this the right problem?" "What are we assuming?" "Where could we be wrong?"
Narrative communication: Explaining quality in terms that matter to business decisions

Practice this: Next time you find a bug, don't just file it. Tell the story: What user scenario led here? What business impact does this have? What does this suggest about our assumptions? Practice translating technical findings into human terms.

5. Start Building, Start Failing, Start Learning

I learned more from my failed releases than from all the documentation I read. Completion theater taught me verification. Token explosions taught me context management. Agent conflicts taught me orchestration patterns.

You can't learn agentic engineering by reading about it. You learn by doing, failing, and adjusting.

Practice this: Pick a small quality problem. It could be generating test data. It could be writing API contract tests. It could be analyzing log files for patterns. Build something—anything—that uses agents to solve it. Ship it. Watch it fail. Fix it. Repeat.

The Mindset Beneath the Methods

If there's one thing I want you to take from this article, it's this:

The agentic age doesn't diminish quality engineering. It demands more of it.

When code generation accelerates, skeptical inquiry becomes more critical. When agents can execute thousands of tests, deciding which tests matter becomes the bottleneck. When AI handles the mechanics, human judgment about value, risk, and experience becomes the differentiator.

Kent Beck, reflecting on working with his "genie," writes something that stuck with me:

"I miss the back-and-forth with another human who cares about the code. I miss being surprised by someone else's solution. I miss the social pressure to explain my thinking out loud, which always makes the thinking better."

He's describing what we lose. But he's also describing what we must preserve—through different means.

Pairing with AI isn't the same as pairing with a human. But the purpose of pairing—to catch what we miss, to challenge our assumptions, to maintain quality under speed—that purpose remains. We just pursue it differently now.

The agent can read your code. It can suggest improvements. It can catch patterns you miss. But it can't care about your users. It can't feel the frustration of a confusing interface. It can't understand why this feature matters to this customer in this moment.

That caring, that feeling, that understanding—that's HXT. That's what remains human.

And the orchestration, the verification, the scaling of quality activities across a codebase that grows faster than any human could track—that's Agentic QE. That's what agents enable.

Together, they're the quality practice for 2026 and beyond.

The Invitation

I started The Quality Forge to share what I'm learning as I build. Not the polished success stories—the real journey, including the failures.

If you're standing at the edge, wondering whether to jump into agentic engineering, here's my honest assessment:

Yes, you should jump. Not because it's easy—it isn't. Not because the tools are mature—they're not. But because the transformation is happening whether you participate or not, and practitioners who learn now will shape how it evolves.

No, it won't replace you. But it will change what you do. The checks you design today will be executed by agents tomorrow. The test data you craft by hand will be generated at scale. The coverage analysis you puzzle over will be computed in seconds. But the testing itself—the exploration, the investigation, the learning, the judgment about what matters—that remains yours. The question is whether you're the one directing that investigative work or whether you're waiting for someone else to figure it out.

Start small, verify everything, expect failure. The tools are powerful and unreliable—like a laser that sometimes shoots pellets, as Karpathy puts it. That unreliability is exactly why quality engineering matters more, not less.

2025 was my year of transformation—from VP to conductor, from leading a team to orchestrating agent swarms. I built three platforms, shipped dozens of releases, failed publicly, and learned from it.

2026 is your year to begin. Whatever "begin" means for your context—a first prompt, a first agent workflow, a first failed experiment that teaches you something real.

The earthquake has already happened. Time to learn how to dance on shifting ground.

Happy New Year. See you in 2026.

Let's Connect

Want to discuss the quality mindset shift, share your 2026 plans, or explore how to bring agentic QE to your team?

Connect on LinkedIn Email Me

New Year, New Paradigm

The Earthquake Already Happened

What Actually Changed

The Verification Paradox

Two Frameworks for the New World

PACT: Orchestrating Quality Agents

Proactive

Autonomous

Collaborative

Targeted

Human Experience Testing: What Machines Can't Judge

The Synthesis: Agentic QE + HXT

What to Do Differently in 2026

1. Learn to Verify Before You Trust

2. Think in Orchestration, Not Automation

3. Master the New Abstraction Layer

4. Invest in Human Skills That Agents Can't Replace

5. Start Building, Start Failing, Start Learning

The Mindset Beneath the Methods

The Invitation

Related from The Quality Forge:

The Conductor Finally Reads the Score

From VP to Conductor: My 2025 Transformation Journey

What is Agentic QE? (And Why PACT Matters)

The Tester's Journey: From Chat to Conductor

Let's Connect