Monday morning at 8:47am
I received that dreaded message: “Our AI is sending nonsense to our customers.”
My coffee hadn’t even kicked in yet, but there I was, diving into logs, trying to figure out why our carefully crafted email automation had suddenly decided to get philosophical about toast (I wish I was joking). Welcome to an AI agent meltdown!
Let me take you back three weeks to when this all started…
The client’s email nightmare
A small but growing e-commerce business approached me with a problem that’s probably keeping many of you up at night. Their customer service team was drowning in repetitive email queries. Order status updates, return policies, shipping questions – the same inquiries, over and over, eating up over six hours of their day.
“We tried templates,” the founder told me, “but customers can tell. They want personalised responses, and frankly, my team is burnt out copying and pasting all day.”
The impact was real:
- Response times had ballooned to 48 hours
- Customer satisfaction scores were plummeting
- Their best service rep had just handed in her notice, citing “email-induced existential crisis” (her words, not mine)
Building the solution: A marriage of modern tools
Here’s where it gets interesting. Instead of building from scratch, I decided to leverage the current explosion in agentic AI capabilities.
My tech stack looked like this:
- n8n for the workflow orchestration (self-hosted, because I’m a control freak)
- Claude API as our AI brain (chose it over OpenAI for its superior context handling)
- Microsoft Graph API for email integration
- Supabase for conversation history and context storage
- Make.com as a backup automation layer (belt and braces approach)
The plan was elegant: emails arrive, n8n triggers the workflow, extracts the content, checks our knowledge base, generates a contextual response via Claude, and sends it through Microsoft’s infrastructure. What could possibly go wrong?
Oh, where do I begin?
Challenge 1: The authentication tango
Microsoft Graph API authentication is like trying to solve a Rubik’s cube blindfolded. OAuth2 tokens expiring at random times, refresh tokens that wouldn’t refresh, and permission scopes that seemed to change their minds daily. I spent two days in what I now call “Azure Active Directory purgatory,” clicking through endless configuration screens.
My favourite moment? Discovering that the “Mail.Send” permission doesn’t actually let you send mail unless you also have “Mail.ReadWrite” – because apparently, Microsoft believes you need to read emails to send them. Logic.
Challenge 2: AI ‘context amnesia’
Here’s something they don’t tell you about AI agents: they’re brilliant until they’re not. Our Claude integration would craft beautiful, contextual responses… until it randomly forgot everything about the conversation. One minute it’s discussing a customer’s order for hiking boots, the next it’s recommending recipes for banana bread.
The issue? Token limits and context window management. Turns out, feeding an AI agent the entire customer history plus your knowledge base plus the current email thread is like trying to stuff a turkey with another, larger turkey.
Challenge 3: The personality crisis
This was the fun one. The client wanted responses that matched their brand voice: “friendly, helpful, with a dash of Northern charm.” What we got initially ranged from Victorian butler (“I do hope this missive finds you in good spirits”) to overenthusiastic American teenager (“OMG! Your order is literally on its way!”).
Getting creative with constraints
Fixing authentication
After much hair-pulling, I discovered the magic combination: using delegated permissions with a service account, implementing a token refresh mechanism in n8n that runs every 45 minutes (not 60, because Microsoft is an arse), and storing credentials in environment variables with proper encryption. Also built a fallback system using Power Automate that kicks in if the Graph API throws a wobbler.
Solving that ‘context amnesia’
The breakthrough came when I stopped trying to feed Claude everything at once. Instead, I built a smart context management system:
- Recent conversation history: last 3 exchanges only
- Relevant knowledge base snippets: dynamically selected based on keywords
- Customer profile summary: compressed to 200 words max
- A “memory jogger” prompt that explicitly tells Claude what it discussed previously
Nailing the brand voice
This required proper prompt engineering. I created a “voice calibration” document with 20 example responses in the client’s preferred style. But the real trick? I added a “voice consistency checker” – a second AI call that reviews the response and adjusts it if it’s gone off-brand. Yes, I’m using AI to police AI. We’re living in the future, folks.
The results: Worth every headache
Three weeks later, here’s where we landed:
- Average response time is down from 48 hours to 3 minutes
- Customer satisfaction scores up 34%
- 5.5 hours saved per day
- Accuracy rate is 94% (the other 6% gets flagged for human review)
My personal favourite: The customer service team now actually enjoys their jobs again.
And yes, I fixed this morning’s “quantum toast” incident. Turns out, someone had updated the knowledge base with their philosophy dissertation notes instead of the shipping FAQ. These things happen.
Over to you
I’m genuinely curious to know how many of you are wrestling with similar AI automation challenges? The landscape is evolving so rapidly that what worked last month might be obsolete today.
If you’re struggling with email automation, agentic AI behaviour, or just want to offer your commiserations about Microsoft Graph API authentication, drop us a comment. Happy to share code snippets, workflow diagrams, or just swap war stories.
And if your AI agent starts philosophising about breakfast foods, you know who to call.