How to Build an AI-First Company

Every agency I speak to is rolling out AI. Almost none of them are doing it well. The pattern is depressingly consistent: someone reads a LinkedIn post about agents, the leadership team panics, ChatGPT licences get bought for everyone, and three months later nobody can point to a single workflow that's measurably better.

The problem isn't the tools. The tools are remarkable. The problem is that AI transformation is a change management exercise dressed up as a software purchase, and most agencies are skipping the change management bit entirely.

I've now run this transformation process with agencies and in-house creative teams across the UK and Australia. The five steps below are what actually works, in the order they actually work in. Skip any of them and you'll end up with the same result: enthusiastic team, scattered usage, no compounding value.

Here's the playbook.

01Start with policy, not tools

I covered this in detail in a previous piece, but it's worth saying again because it's the step everyone wants to skip: before you roll out any AI tooling, write the policy.

Approved tools list. Traffic-light data classification. Three-tier review process. Governance structure. Two pages is enough to start. Without it, your team is making it up as they go, your client data is leaking into platforms nobody's audited, and your legal team is going to have a quiet panic in about six months when something surfaces.

"AI policy isn't bureaucracy. It's the floor that lets everything else stand up."

The policy is what gives your team permission to experiment with confidence. It tells them what's safe, what's not, and who to ask when they're unsure. Without it, the most cautious people on your team simply won't use AI at all, and the least cautious will use it in ways that create real problems.

Get this written first. Then move on.

02Train the team to see opportunities, not just use tools

The single biggest mistake I see in AI rollouts is confusing tool training with capability training. Showing your team how to type into ChatGPT is not training. It's a tutorial. Capability training is teaching them how to see AI opportunities in their own work.

There's a meaningful difference. A tutorial teaches someone to use a hammer. Capability training teaches them to spot which problems are nails. In a marketing team, that means:

Understanding how LLMs actually work

Not the technical details. The mental model. What they're good at, where they fail, why hallucinations happen, why context matters, why the same prompt produces different output on different days. Without this, your team will either over-trust the output or dismiss the tool entirely the first time it gets something wrong.

Understanding the platform landscape

Claude vs GPT vs Gemini. Where each one's strong. When to reach for which. What the enterprise versions give you that the consumer versions don't. This sounds basic, but most marketing teams treat all AI tools as interchangeable, and they aren't.

A prompting framework that actually sticks

I use RICCE (Role, Instruction, Context, Constraints, Examples) because it's simple enough to remember in a meeting and rigorous enough to produce reliable output. Whatever framework you choose, pick one and use it consistently. Ad-hoc prompting produces ad-hoc results.

Six hours of structured training, delivered properly, gets a marketing team further than six months of self-directed dabbling. The team needs to leave the room able to look at any task in their workflow and ask: could AI do part of this, all of this, or none of this. That's the capability you're building.

03Drive structured experimentation across the team

Once your team can see opportunities, you need to give them permission and structure to test them. This is where most rollouts collapse. The team comes back from training inspired, tries one or two things in the gaps between client deadlines, gets distracted, and the momentum dies.

"Inspired teams without structure produce nothing. Structured teams without inspiration produce mediocrity. You need both."

Structured experimentation means three things:

Time blocked specifically for it. Not "in your spare time." Spare time doesn't exist in a busy agency. Block two hours per week per person, and protect that time the way you'd protect a client deadline.
A shared place to capture experiments. A simple Notion page, a shared doc, a Slack channel. Anywhere your team can log what they tried, what worked, what didn't, and the prompt that did it. Otherwise every experiment lives in one person's head and dies when they go on holiday.
A bias toward starting, not perfecting. The first version of any AI workflow will be 60% as good as the manual version. That's fine. The point of the experiment is to find the workflows worth investing in, not to produce client-ready output on the first try.

I tell teams to aim for ten failed experiments before they look for a successful one. It reframes failure as the goal of the exercise, which is what gets people experimenting at the volume needed to find the genuinely valuable use cases.

04Build a testing framework that actually grades quality

Experiments are noise without a framework to evaluate them. "I tried it and it was pretty good" is not a useful data point. You need a way to grade AI output that's consistent across the team and tied to real business value.

The framework I use has three components:

Quality grade

How does the AI output compare to a human doing the same task? Worse, equivalent, or better. Be honest. Most AI output starts at "worse but faster," and that's fine for some tasks (research synthesis, first drafts, data extraction) and unacceptable for others (final client copy, strategic recommendations, sensitive comms).

Time saved

How long does the manual version take? How long does the AI-assisted version take? Including the time spent reviewing, correcting, and re-prompting. The honest answer is often less impressive than the headline, especially in the early weeks. Track it anyway.

Confidence score

How confident are you that the AI version is reliable enough to use without heavy review? This is the most important metric and the one teams skip. A workflow that saves an hour but requires an hour of review hasn't saved anything. A workflow that saves twenty minutes and produces output you trust is genuinely valuable.

Score every experiment on these three. After a month you'll have a clear picture of which AI applications are worth building into permanent workflows and which are interesting but not yet ready. That picture is worth more than any vendor pitch.

05Map the workflows that actually matter

You can't transform what you can't see. Most agencies have a vague sense of how work flows through the team, but no documented map of who does what, in what order, with what inputs, producing what outputs. That gap is fatal for AI transformation, because AI doesn't replace tasks, it changes workflows.

Workflow mapping is unglamorous and slow. It also separates the agencies that get AI right from the ones that don't. The exercise is straightforward:

Pick five high-volume processes. Pitch responses, content production, reporting, briefing, QA. Whatever your team does most often.
Map each one end-to-end. Every step, every handoff, every decision point. Who does it. How long it takes. What inputs it needs. What it produces.
Identify the AI-eligible steps. Where is AI obviously useful (research, summarisation, first drafts, data extraction)? Where is it obviously not (client relationships, strategic judgement, final approval)? Where is it ambiguous (analysis, recommendations, creative direction)?
Score each step using the framework from the previous section.

What comes out of this is a heat map of your operations. Some workflows will have clear AI-eligible steps that you can automate this quarter. Others will turn out to be mostly human work with one or two AI assists. A few will turn out to be candidates for full reinvention. All three are valuable findings.

The agencies that do this properly build compounding advantages. The ones that skip it end up with a Slack channel full of clever prompts and no way to operationalise any of it.

06Build your first automations: built-in features first, custom on top

Now you're ready to build. The temptation here is to leap straight to custom development. Resist it. The right sequence is built-in features first, then custom layers on top of those, and only then bespoke development.

Built-in features

Every major AI platform now ships with capabilities that solve 70% of marketing automation needs out of the box. Claude has Projects, Skills, and Tools. ChatGPT has GPTs and the new agentic features. Both integrate with the file systems, email clients, and calendars your team already lives in. Get fluent in these before you build anything custom. Most of what teams want to build, they don't actually need to build.

Custom layers on top

Where built-in features don't quite fit your workflow, build a thin custom layer that wraps them. A Claude Project loaded with your brand voice and approved sources. A custom Skill that runs your specific QA process. A simple internal tool that pipes your CRM data into a structured prompt. These are days of work, not months, and they reinforce the workflows you've already mapped rather than inventing new ones.

Bespoke development last

True custom builds (agents, integrations, internal products) come last, and only for the workflows where the testing framework has proven both high time savings and high confidence. By the time you get here, you know exactly what you're building and why, and the build itself becomes the easy part.

"The agencies winning at AI aren't the ones with the most custom builds. They're the ones who built fewer things, on stronger foundations, that their team actually uses."

Putting it all together

The five steps are sequential, but the cycle never stops. You train, you experiment, you test, you map, you build. Then you train again on the new tools and capabilities, you experiment with what's just been released, you test, you map, you build. AI transformation isn't a project. It's an operating rhythm.

The agencies that establish that rhythm now will compound a meaningful advantage over the ones still buying licences and hoping for the best. The technology gap between the two camps is going to widen, not narrow, every quarter from here.

Start with the policy. Then build the rest.

How to build an AI-first company: the 5-step transformation playbook