When I speak to people about their approach to AI deployment at their businesses there are a few hesitations that everyone has. Here are the common ones:

"Doesn't it always hallucinate"
"How can I be sure that it's accurate"
"I don't know how to deploy it"
"Won't it take our jobs"

The future of deployment of AI is not via a chat interface as we know it. But built-out automation and systems that utilise proprietary data to do the jobs AI is much better than us at. Research, multi-source data ingestion and data analysis and processing. But if you're giving it the keys to decision making, how can you be sure it's the right one. The answer is Asynchronous AI.

But before we get to that, let's understand the issue.

01AI hallucinations

AI hallucinations - when large language models confidently provide false or fabricated information - have been a long-standing and well-documented issue. They range from subtle factual inaccuracies to completely invented events, citations, or reasoning. While they're often written with remarkable fluency and authority, their unreliability creates real limitations when AI is expected to perform tasks requiring consistency, reliability, and trust.

The issue becomes even more complicated when you're no longer operating within the friendly confines of a chat interface. While humans can manually detect and course-correct a hallucination mid-conversation in the UI, the story is different when you're using the AI via an API, where inputs and outputs are meant to be automated, invisible, and assumed to work.

02A test of consistency, output, and accuracy

I ran a simple but revealing test that evaluated outputs across three metrics: consistency, output, and accuracy. We prompted GPT 3.5 Turbo 100 times via the API with the same input and observed how stable the outputs were.

Each test escalated in complexity - the sort of question you'd ask a junior in your team to complete. In order to grade the consistency we looked at structural consistency, whether results were output in the requested format, and whether the facts were accurate.

AI Accuracy Test - Looking at SEO Complexity

On a one-off basis through the UI, this variation is manageable. You notice it. You click regenerate. You fix it. No problem.

But via the API, where automation depends on deterministic and predictable behaviour, this is a breaking issue. In production environments where human oversight isn't practical for every request, hallucination and inconsistency aren't just annoying - they're dangerous.

03The importance of thresholds

This leads us to the concept of thresholds - the invisible standards that dictate whether an AI response is "good enough" to be used. Think of thresholds as the AI's quality gate: how well does the output need to align with factuality, task specificity, or user tone before it's deemed acceptable?

Let's consider a playful but telling example. If you ask an AI: "Tell me a story about a mop." You might get three different levels of threshold:

Low Threshold

"Once there was a mop. It cleaned floors. The end." Technically accurate. Completely uninteresting. Functionally useless.

Mid Threshold

"The mop had dreams of being a dancer, twirling across the linoleum like Fred Astaire. But it was stuck in a janitor's closet... until one night..." Creative, engaging. A solid answer.

High Threshold

"In 1973, amidst the oil crisis, a factory in Detroit built a mop with an experimental polymer head that would later be considered revolutionary..." Original, researched, deeply structured.

Setting and maintaining the right threshold is critical. And to do that reliably, you need more than just one-shot AI output. You need an architecture that can evaluate, refine, and structure, autonomously.

04Enter asynchronous AI

Now imagine AI not as a monolithic black box that returns a string of text, but as a distributed asynchronous system, like a team of people with specialised roles:

The Briefer

Interprets the prompt and defines the goals.

The Executor

Actually does the work - writing, coding, summarising.

The Reviewer

Checks the output for quality, accuracy, and tone.

The Outputter

Packages the final result in the desired format.

This is asynchronous AI - where each "role" can be played by separate instances or phases of the model, running sequentially or in parallel, evaluating and improving each other's output.

It mimics the way high-performance teams work: distributing complexity, enabling specialisation, and introducing checks and balances. But balanced with differing levels of thresholds and standards to ensure desired deliverables are met.

And just like in a human team, this system doesn't assume perfection in the first draft, but rather, builds in refinement as a feature, not a patch.

05Why this matters: context and limitations

In a conversational UI, a lot of this happens invisibly. Context is preserved in your chat history. The model remembers your earlier preferences. It self-corrects, adds nuance, and even "learns" over the session (within limits). But that context - the glue holding everything together - doesn't exist in the same way via API.

When using the API, context windows become a hard constraint. Everything the model needs to understand has to be included in the payload: your prior prompts, any preferences, the response history, all of it. If you don't manage this carefully, the model responds like it has no memory - because it doesn't.

This is where asynchronous, multi-agent, team-like AI becomes not just helpful, but necessary. It allows you to simulate long-term memory, enforce standards, manage context, and execute multi-step reasoning - all without assuming the model will just "get it" from a single shot.

06Final thoughts: the path forward

The future of AI isn't about pushing harder on single-shot prompt engineering. It's about orchestrating AI like a team, thinking in systems, and designing workflows where multiple agents collaborate to meet quality thresholds.

Hallucinations, inconsistency, and brittle context limitations aren't just bugs - they're signs that we're still thinking too linearly. The solution isn't just better models. It's better architecture.

Asynchronous AI is that architecture. And it's how we'll go from clever answers to trustworthy systems.