How We Cut AI Agent Calls by 91% — And Made Our Platform Feel Instant Again

by

Shubham Singh

A Quick Backstory

At TribalScale, we spend a lot of time building products on top of cutting-edge AI. It’s fun, it’s challenging, and honestly, it keeps you on your toes. Every week, something new shows up that forces you to stretch as an engineer — and that’s exactly the kind of environment where you grow fast.

This story is about one of those moments.

A few weeks ago, I was working on our AI-powered sales intelligence platform. The vision was simple: a system that helps sales teams instantly find high-intent companies, key decision-makers, and real contact data — automatically.

But somewhere along the way, our “instant” system decided to take… nine minutes to respond. Nine. Whole. Minutes.

That’s when we knew we had to roll up our sleeves.

The Problem We Ran Into

When a sales rep asked something like:
“Show me companies in the automotive industry adopting AI technology.”

Our system did what we asked it to do — just not in a smart way.

Here’s what was happening behind the scenes:

  • The AI generated buying signals

  • Then we asked the AI for companies matching each signal

  • Then for each company, we asked which roles to target

  • Then for each role, we asked for contact details

It looked something like this:

  • 5 buying signals → 5 calls

  • 15 companies → 15 calls

  • 45 roles → 45 calls

Total: 55+ AI agent calls for a single user query.

We effectively recreated the classic N+1 problem — the same one you see in databases and APIs, now happening inside an AI orchestration layer.

And it hurt us in three ways:

1. It was slow

55 calls × ~10 seconds each = ~9 minutes of waiting.

2. It hit rate limits constantly

We were pushing Azure OpenAI well beyond reasonable throughput.

3. It was expensive

Even $0.01 per call becomes noticeable when multiplied at scale.

It was clear we needed a smarter approach.

The Turning Point

As a team, we explored every option — caching, parallelization, cheaper models, trimming data, you name it. Each helped on one axis but created compromise elsewhere.

One thing about working at TribalScale: we don’t settle for “good enough.” We look for solutions that improve performance without sacrificing quality or integrity.

And that led us to the strategy that made everything click: batching.

The Solution: Batch Processing (Our Game-Changer)

Instead of asking the AI the same question 55 different times, we reorganized our prompts so we could send groups of items together.

Think of it like this: Instead of calling someone five times with five questions, you hold your questions, call once, and get everything in one go.

Here’s how it transformed each stage:

1. Company Extraction

  • Old: 5 calls

  • New: 1 call

  • 80% fewer calls

2. Role Identification

  • Old: 15 calls

  • New: 1 call

  • 80% fewer calls

3. Contact Extraction

  • Old: 45 calls

  • New: 1–3 calls (batches of ~20)

  • 93% fewer calls

This change alone brought us from 55 calls → 5 calls.

But batching alone wasn’t enough — we needed to make it reliable.

Making It Reliable: Fallbacks, Limits, and Validation

Real-world systems don’t live in ideal conditions. So we added safety nets:

1. Batch Size Limits

We cap batches to prevent overwhelming the model or exceeding context windows.

2. Automatic Fallbacks

If a batch fails (timeout, content too large, any model hiccup), we fall back to a sequential strategy automatically.

3. Smart Timeout Scaling

Bigger batches get more time; smaller batches stay snappy.

4. Schema Validation (Our Secret Weapon)

We added three layers:

  • AI-enforced JSON schema

  • Runtime validation with Zod

  • Business logic validation for sanity checks

This ensured we always got structured, clean data — crucial when batching many items together.

The Results: Night and Day

For a typical query (5 signals → 15 companies → 45 roles):

Metric

Before

After

Improvement

API Calls

55

5

91% fewer

Time

9 minutes

52 seconds

90% faster

Cost

$0.55

$0.05

91% cheaper

Rate Limit Errors

Frequent

Rare

Much more reliable

One minute instead of nine means the difference between a frustrated user and a delighted one. For us, it was a clear sign we were on the right track.

Lessons We Took With Us

1. The N+1 Problem Shows Up Everywhere

Even in AI orchestration pipelines, the same principles apply. Identify it early, batch aggressively.

2. Modern AI Handles Batches Far Better Than Expected

With careful prompting and structure, models like GPT-4o handle complex batch requests gracefully.

3. Always Build for Graceful Degradation

Resilience matters as much as speed. If batches fail, the system should still function.

4. Optimize Without Compromising Quality

We didn’t cut corners — we improved efficiency while keeping accuracy intact.

5. This Is Why Working at TribalScale Is Fun

We get to solve real-world problems with modern technology, and every challenge pushes us to innovate creatively. This project was a great reminder of why building with emerging AI tech is exciting — there’s always room to make things better.

Closing Thoughts

Batch processing didn’t just speed things up — it transformed the user experience. Our platform now feels fast, efficient, and reliable, and we achieved all of that without lowering the quality of results.

If you’re building AI-powered systems, take a close look at your request patterns. There might be massive performance gains hiding in plain sight — just like we found.

© 2025 TRIBALSCALE INC

💪 Developed by TribalScale Design Team

© 2025 TRIBALSCALE INC

💪 Developed by TribalScale Design Team