How We Cut AI Agent Calls by 91% — And Made Our Platform Feel Instant Again
by

Shubham Singh
A Quick Backstory
At TribalScale, we spend a lot of time building products on top of cutting-edge AI. It’s fun, it’s challenging, and honestly, it keeps you on your toes. Every week, something new shows up that forces you to stretch as an engineer — and that’s exactly the kind of environment where you grow fast.
This story is about one of those moments.
A few weeks ago, I was working on our AI-powered sales intelligence platform. The vision was simple: a system that helps sales teams instantly find high-intent companies, key decision-makers, and real contact data — automatically.
But somewhere along the way, our “instant” system decided to take… nine minutes to respond. Nine. Whole. Minutes.
That’s when we knew we had to roll up our sleeves.
The Problem We Ran Into
When a sales rep asked something like:
“Show me companies in the automotive industry adopting AI technology.”
Our system did what we asked it to do — just not in a smart way.
Here’s what was happening behind the scenes:
The AI generated buying signals
Then we asked the AI for companies matching each signal
Then for each company, we asked which roles to target
Then for each role, we asked for contact details
It looked something like this:
5 buying signals → 5 calls
15 companies → 15 calls
45 roles → 45 calls
Total: 55+ AI agent calls for a single user query.
We effectively recreated the classic N+1 problem — the same one you see in databases and APIs, now happening inside an AI orchestration layer.
And it hurt us in three ways:
1. It was slow
55 calls × ~10 seconds each = ~9 minutes of waiting.
2. It hit rate limits constantly
We were pushing Azure OpenAI well beyond reasonable throughput.
3. It was expensive
Even $0.01 per call becomes noticeable when multiplied at scale.
It was clear we needed a smarter approach.
The Turning Point
As a team, we explored every option — caching, parallelization, cheaper models, trimming data, you name it. Each helped on one axis but created compromise elsewhere.
One thing about working at TribalScale: we don’t settle for “good enough.” We look for solutions that improve performance without sacrificing quality or integrity.
And that led us to the strategy that made everything click: batching.
The Solution: Batch Processing (Our Game-Changer)
Instead of asking the AI the same question 55 different times, we reorganized our prompts so we could send groups of items together.
Think of it like this: Instead of calling someone five times with five questions, you hold your questions, call once, and get everything in one go.
Here’s how it transformed each stage:
1. Company Extraction
Old: 5 calls
New: 1 call
80% fewer calls
2. Role Identification
Old: 15 calls
New: 1 call
80% fewer calls
3. Contact Extraction
Old: 45 calls
New: 1–3 calls (batches of ~20)
93% fewer calls
This change alone brought us from 55 calls → 5 calls.
But batching alone wasn’t enough — we needed to make it reliable.

Making It Reliable: Fallbacks, Limits, and Validation
Real-world systems don’t live in ideal conditions. So we added safety nets:
1. Batch Size Limits
We cap batches to prevent overwhelming the model or exceeding context windows.
2. Automatic Fallbacks
If a batch fails (timeout, content too large, any model hiccup), we fall back to a sequential strategy automatically.
3. Smart Timeout Scaling
Bigger batches get more time; smaller batches stay snappy.
4. Schema Validation (Our Secret Weapon)
We added three layers:
AI-enforced JSON schema
Runtime validation with Zod
Business logic validation for sanity checks
This ensured we always got structured, clean data — crucial when batching many items together.
The Results: Night and Day
For a typical query (5 signals → 15 companies → 45 roles):
Metric | Before | After | Improvement |
|---|---|---|---|
API Calls | 55 | 5 | 91% fewer |
Time | 9 minutes | 52 seconds | 90% faster |
Cost | $0.55 | $0.05 | 91% cheaper |
Rate Limit Errors | Frequent | Rare | Much more reliable |
One minute instead of nine means the difference between a frustrated user and a delighted one. For us, it was a clear sign we were on the right track.
Lessons We Took With Us
1. The N+1 Problem Shows Up Everywhere
Even in AI orchestration pipelines, the same principles apply. Identify it early, batch aggressively.
2. Modern AI Handles Batches Far Better Than Expected
With careful prompting and structure, models like GPT-4o handle complex batch requests gracefully.
3. Always Build for Graceful Degradation
Resilience matters as much as speed. If batches fail, the system should still function.
4. Optimize Without Compromising Quality
We didn’t cut corners — we improved efficiency while keeping accuracy intact.
5. This Is Why Working at TribalScale Is Fun
We get to solve real-world problems with modern technology, and every challenge pushes us to innovate creatively. This project was a great reminder of why building with emerging AI tech is exciting — there’s always room to make things better.
Closing Thoughts
Batch processing didn’t just speed things up — it transformed the user experience. Our platform now feels fast, efficient, and reliable, and we achieved all of that without lowering the quality of results.
If you’re building AI-powered systems, take a close look at your request patterns. There might be massive performance gains hiding in plain sight — just like we found.
