Automation in Insurance: Why AI Pilots Stall

Most brokerage AI pilots are not failing because the models are bad. They are failing because the work underneath the model is not structured for them.

BCG’s 2025 study put a number on it: only 7% of insurance organizations have scaled AI efforts. Deloitte, KPMG, and PwC reach the same conclusion from different angles, citing poor data foundations, legacy infrastructure, weak business and tech collaboration, and data silos. The pattern is consistent. Automation in insurance keeps hitting the same wall, and the wall is not the technology.

The wall is the operating model. Service work in brokerages is not measured or structured at the task level, which is why aggregate metrics hide variance, why pilots stall after the demo, and why automation projects keep producing thin productivity wins that do not compound. The fix is older than AI. It is the same operating discipline manufacturing adopted in the 1980s, banking adopted in the 1990s, and healthcare administration adopted in the 2000s. The new part is that the prerequisites for Six Sigma and the prerequisites for safe AI deployment are now the same prerequisites.

This post is the short version of the argument. The full working paper is linked at the end.

What Aggregate Metrics Hide

Ask any brokerage COO what KPIs they monitor and the answers come back consistent across platforms from $50M to $5B in GWP: gross written premium, total revenue, commission rate, service team headcount, revenue per employee, retention rate, book churn, new business written, E&O claims frequency.

These metrics are necessary. They are not sufficient.

What they share is that they all describe the business at the aggregate level. They tell you whether the brokerage is growing, profitable, and retaining customers. They do not tell you whether the work inside the brokerage is being done efficiently, whether the right people are doing the right work, or where capacity is leaking. Six Sigma’s foundational insight is that variance is where the money is. Two service teams can deliver the same aggregate output, the same retention rate, and the same revenue per employee while having radically different unit economics underneath. Both look identical on the dashboard until you measure at the task level.

This matters for automation in insurance because every automation decision is a unit-of-work decision. You cannot automate a vague queue. You cannot route work to AI if you do not know which tasks are happening, who owns them, what their cycle time is, or which ones actually require a license. The dashboard tells you the business is fine. The task layer tells you where the next dollar of margin lives.

Why Insurance AI Pilots Keep Stalling

The published reasons for AI failure in insurance look like five different problems. They are one problem.

Deloitte’s GenAI survey points to poor data foundations, legacy infrastructure, and weak business and tech collaboration. KPMG’s 2025 insurance AI research cites data silos, inconsistent formats, employee resistance, and cross-department alignment. PwC says GenAI’s biggest challenge in insurance is data infrastructure and availability, with data scattered, inaccessible, and not high-quality enough for AI to operate on. BCG’s scaling number ties it all together. Only 7%.

Look at those reasons together. Every one of them is a description of the same underlying state: service work in brokerages is not structured as discrete, measurable, routable tasks. The data is not in silos because the silos are bad. The data is in silos because the operating model never required the work to be lifted out of the silos into a shared task structure in the first place. AMSs were built as systems of record, not systems of work. Email, phone, portals, and documents are where the work actually happens, and none of those surfaces produce the task-level telemetry that either Six Sigma or AI needs to function.

The result is a pilot that looks impressive in a demo and quietly underperforms in production. The AI gets handed work that is partially defined, with unclear inputs, ambiguous owners, no measurable outcome, and no feedback loop. It does what it can with what it has, and what it has is not enough. The right response is not to swap models. It is to fix the operating layer underneath.

What Six Sigma Actually Demands

Six Sigma is not a software category. It is an operating discipline. Applied to insurance service operations, it makes five demands that look familiar to anyone who has tried to deploy AI in a regulated environment.

Task definition. Every unit of work has a clear input, output, owner, and done state. A COI request is not one task. It is a sequence of tasks: intake, coverage verification, endorsement check, certificate generation, licensed review when required, delivery, and archive. Most brokerages route the entire sequence to a licensed account manager who does intake and certificate production just to reach the small subset of requests that actually need licensed review. That is not a Six Sigma operation. It is also not an operation AI can help with safely.

Task telemetry. Every task instance carries owner, timestamps, cycle time, and outcome. Without this layer, there is no before-and-after baseline, which means there is no measurable ROI on any improvement, automation included.

Root cause tagging. Defects are classified by staffing, carrier, system, customer, or complexity. “The AI got it wrong” is not a root cause. Neither is “the CSR was slow.” Improvement loops require structured cause attribution.

Authority boundaries. License, skill, and approval rules attach to each task type. This is the safety control for regulated work and the precondition for letting an AI agent execute anything without supervision.

Continuous calibration. SLA, routing, playbook, and QA rules improve with observed performance. The operating system gets better as the data accumulates.

These five demands are not new. Manufacturing standardized them by the late 1980s. Banking did the same with back-office operations in the 1990s. Healthcare administration followed in the 2000s. Insurance service operations is one of the last large white-collar process categories still largely outside this discipline.

The Same Prerequisites Make AI Deployable

Here is the part that surprises operators when they first see it on paper. The list above is also the list of conditions AI needs to work in regulated service operations.

AI cannot operate on a vague queue. It needs task definition. AI cannot prove its own ROI. It needs task telemetry. AI cannot self-improve in production without structured cause attribution. It needs root cause tagging. AI cannot be trusted to execute regulated work without license-aware authority boundaries. AI cannot calibrate against real performance without a feedback loop.

This is why we keep saying that automation in insurance is downstream of the operating model. The same task-level discipline that recovers hidden margin under Six Sigma is the prerequisite for safely deploying AI on top of regulated work. Brokers who instrument the operating layer first do not face a choice between Six Sigma and AI. They get both, from the same investment, because the underlying data structure was the bottleneck for both.

Why Now

Three things have changed in the last few years, and they have changed in the same direction.

The data infrastructure exists. Modern operating layers can sit above the AMS and capture work as structured tasks across email, phone, portals, and documents. The technical objection that used to end this conversation, “we cannot get the data out of the AMS,” does not hold anymore. There are 8 production-grade AMS integrations behind a working operating stack today, ours included.

The labor pressure closed the alternative. Liberty Mutual and Safeco’s 2025 workforce research across 1,242 agency employees found 51% of agency employees reporting burnout, 57% reporting mental and physical exhaustion, and 65% of non-principal roles often feeling stressed. The “just hire more CSRs” answer was already expensive. It has now also become unreliable. Service capacity is no longer a hiring problem you can solve by spending more on payroll.

The AI question is unavoidable. Every COO is being asked by their CEO, board, or PE sponsor how AI will be deployed. The honest answer is that AI cannot be deployed safely or productively on unstructured service work, which means the real prerequisite is not picking a model. It is instrumenting the work. The brokerages that do that first will be the ones with a credible AI story 18 months from now. The ones that do not will be running pilots that never scale.

What This Looks Like in Practice

The proof point we keep coming back to is one acquired agency that went from 17.9% to 60%+ EBITDA in 12 months on the same book of business. Same customers. Same carriers. Same lines. The operating model changed. Service work was lifted into a task structure with license-aware routing, telemetry, and continuous calibration. Capacity was reallocated. Licensed staff stopped doing unlicensed work. The economics changed because the underlying operating model changed, not because the business found new revenue.

That is the shape of the opportunity. It is not an AI story on its own. It is an operating-discipline story that makes AI deployable as a second-order benefit.

If you would rather talk it through with us, start a conversation with COVU.

Insurance Operations Need an Operating Model, Not Another AI Pilot (A Six Sigma Approach)

Highlights