A simple defensive pattern to prevent token limit failures when working with large API responses.

Introduction

I’ve recently been involved in a proof of concept (POC) where one of my colleagues kept hitting an error like this:

Error occurred while executing the Generative AI agent runtime client…
“message”: “Input tokens exceed the configured limit of 272000 tokens. Your messages resulted in 1245073 tokens. Please reduce the length of the messages.”,
“code”: “context_length_exceeded”

The cause was straightforward to identify. The Business Object call before the LLM node returned a very large payload, and we were passing the entire response into the model. That eventually pushed us over the configured token limit (rule of thumb: ~4 characters per token).

In our POC, we returned all fields from the Account object, and the API page size was 500.

Start with upstream optimisation

In many cases, the right fix is upstream:
– Select only the fields you need
– Reduce page size and paginate results when fetching from Fusion or external sources

Sometimes, however, you can’t predict payload size ahead of time. Sometimes pagination isn’t available, or the API isn’t well documented.

It’s also important to be explicit here: LLMs won’t optimise this for you. Token limits are hard constraints—not soft suggestions. If you pass oversized payloads, the request won’t degrade gracefully; it will simply fail.

What we did: adding a payload control step before the LLM

As a stopgap—until we refined the API call via Business Objects—we added a simple pre-processing step using a code node before passing the payload into the LLM.

The goal was to:
– Keep the JSON valid
– Reduce the payload size to stay within token limits

The code is intentionally simple. We capped the total payload at 20,000 characters and built the array incrementally until adding another item would exceed the limit:

let rawOutput = $context.$nodes.SEARCHACCOUNTS.$output.items;

const sourceArray = rawOutput;

let result = [];

const maxChars = 20000;

for (let i = 0; i < sourceArray.length; i++) {

  const candidate = [...result, sourceArray[i]];

  if (JSON.stringify(candidate).length < maxChars) {

    result = candidate;

  } else {

    break;

  }

}

return result;

Important consideration: partial results

This approach may return a truncated result set.

If truncation occurs, it should be:
– Captured in your workflow
– Explicitly flagged to the user

This avoids silent data loss and keeps the experience transparent.

Recommendation

If you have straightforward optimisation paths available (pagination, selecting only required fields, etc.), you should absolutely start there.

However, when working with APIs where payload size can vary or where upstream optimisation options are not always available, adding a lightweight payload control step like this can help stabilise your flow.

This is not a substitute for proper API design—it’s a defensive measure to keep things working while you iterate.

Wrap-up

LLMs are powerful tools—but they rely on well-structured, constrained inputs.

A small amount of defensive handling like this can make LLM-driven workflows far more reliable, helping you avoid hard failures while maintaining predictable behaviour.