Your AI Agent Needs a Stress Test

Plus: your customer doesn’t care how smart the model is if the workflow still falls apart at the worst possible moment.

Jun 10, 2026

Your daily signal on AI and CX — minus the hype.

DCX Stat of the day: In a Nubank card-delivery support deployment, a new evaluation-driven AI agent produced a 37 percentage-point lift in AI transactional NPS and a 29 percentage-point gain in self-service rate versus prior agent variants. arXiv

In this issue:

→ AI support gets a stress test
→ UX metrics move beyond model scores
→ Drive-thru AI gets another shot
→ Autonomous ops needs rollback discipline
→ Customer trust starts before launch

🔎 Deep dive

The support agent is only as good as the mess it survived

Nubank researchers published a useful look at what it takes to run customer-support AI at serious scale. The headline isn’t “big bank uses AI.” Fine. Everyone has that slide now. The useful part is the operating discipline around it: structured context, human review, evaluation checks, and validation before the agent gets near live customers.

That changes the CX conversation. The question isn’t, “Can the agent answer?” The question is, “Can we prove this thing behaves well when the customer is confused, impatient, locked out, late, charged incorrectly, or stuck in a policy corner?” That’s where customer experience gets real. Demos usually skip the ugly corners. Customers live in them.

The paper connects offline simulation quality with online customer outcomes. That is the bit to steal. If your AI support roadmap doesn’t have a stress test tied to live journey metrics, you’re basically asking customers to QA your ambition.

Bottom Line: AI service quality comes from testing the messy customer moments before customers are forced to find them for you.

More from NuBank

📬 Copy-Paste Take

Before we expand any customer-facing AI agent, we should be able to show the test set, failure types, escalation rules, human review process, and live customer metric it’s expected to move. If we can’t show that, we’re scaling hope and calling it service.

OPERATOR PLAYBOOK

Build the stress test before the launch deck

Start with the customer moments where failure gets expensive. The model comes after that.

Audit every AI-assisted support flow for four things:

The exact customer intent, emotion, and policy edge cases tested before launch.
The human review process for prompts, retrieved context, and answer quality.
The online metric that proves the agent helped the customer finish the job.
The escalation rule for confusion, frustration, identity risk, money, eligibility, or repeat contact.

Then test whether rehearsal performance predicts what happens when real customers show up with real pressure.

Ask your team: Which customer problem are we comfortable letting the AI handle only after it has failed safely in rehearsal?

Signal: Mature AI service teams treat evaluation design as part of CX design. It’s not a technical appendix someone adds after the champagne slide.

📈 Market Reality Check

Customers are not as ready as leaders think

Hands typing on a tablet with AI superimposed in text in front

TechRadar Pro’s June 8 CX piece points to the gap every AI-service roadmap has to face: more than 90% of business leaders believe customers are comfortable with AI-powered service. Only 42% of customers actually are. Another 28% are actively uncomfortable.

That doesn’t mean customers hate AI. It means they hate being forced to absorb the risk of a company’s automation bet. If the AI removes effort, routes faster, cuts repetition, and keeps a human path open, customers may welcome it. If it adds one more layer between them and the answer, the business just paid to make the journey feel worse.

Why it matters: CX leaders need to pressure-test AI service against customer comfort, not internal enthusiasm. The experience has to feel easier from the customer’s side of the screen. That’s the scoreboard.

Leader confidence minus customer comfort = adoption risk.

🧰 Tool Worth Knowing

ArchIQ

What it does: ArchIQ is McDonald’s renewed AI drive-thru ordering system, reportedly built with Google and being tested at five U.S. locations. Current reporting says it has processed more than one million transactions, with about 90% completed without human escalation.

CX use case: High-volume ordering where speed, language handling, repeat-order recognition, and manager alerts can change both the customer line and the restaurant floor.

Worth watching because: McDonald’s already had one public AI drive-thru stumble. This second attempt is a reminder that customer-facing automation doesn’t get graded on lab accuracy. It gets graded at the speaker box, with noise, pressure, accents, substitutions, coupons, kids yelling in the back seat, and a line of cars behind you.

Bottom line: The tool is worth watching because it puts AI into one of the least forgiving CX environments: fast, physical, operational, and instantly visible when it gets weird.

The DCX AI Today - AI Tool Directory - If you lead a CX team and want a curated shortlist of tools worth evaluating, this is your starting point.

⚡ 90-Second CX Radar

Apple’s Siri AI is both cool and 2 years too late

Apple’s AI assistant update is less aggressive than the agent hype cycle, but the privacy and personal-context angle matters. For customer-facing brands, the lesson is simple: trust becomes a product feature when AI starts reading across personal data.

Why it matters: Customers may accept more AI help when they understand where their data goes, what the assistant can see, and what it’s allowed to do.

Pega Launches Customer Engagement Studio to Transform Marketing Operations with Agentic AI

Pega’s new workspace uses agents to move from campaign brief to personalized customer actions while keeping governance, compliance, and human validation in the flow. That is much closer to the CX fault line: more relevance, faster, without turning the customer journey into a pile of disconnected automated guesses.

Why it matters: Personalization at scale only helps customers when the business can control what gets recommended, when it gets triggered, and who validates it before it reaches the customer.

🧭 Your Move

This issue is about a quieter AI shift: teams are moving from demo confidence to operating proof.

The practical standard is simple: clear stress tests, clean escalation rules, honest customer metrics, and recovery paths that work when the agent gets it wrong.

Pick one AI-assisted customer journey this week and ask for the evaluation file. Skip the launch deck. Skip the vendor promise. Ask for the actual test set, failure taxonomy, escalation standard, and live metric.

If nobody can show it, you have found the work.

Your customer shouldn’t be the first person to discover your AI agent’s edge case.

Until tomorrow,

The Psychology of CX 101: The Action Lab

Get Your Free 6-Sprint CX Program

👥 Share This Issue

Think of one person who’s wrestling with AI in CX right now
and forward this to them.

I’m obsessed with Wispr Flow Pro! Get a Free Month on me.

If someone forwarded this to you, they thought you needed to see it before your next AI planning meeting. Get your own copy.

Discussion about this post

Ready for more?