Why “Human in the Loop” Is Not Enough in GxP AI

Table of Contents

Author

Omer Cimen

CEO & Co-Founder

Share

“Human in the loop” has become one of the most repeated phrases in AI governance. It sounds reassuring. It suggests that if a person reviews the output, the risk is under control and the compliance box is ticked.

In GxP environments, that idea is too thin.

A human reviewer is important, sometimes essential. But in regulated work, human involvement by itself is not a control strategy. A person clicking approve at the end of an AI-assisted workflow does not automatically make the workflow defensible, validated, or inspection-ready. Regulators are moving toward a more demanding view, one that emphasizes context of use, risk, governance, documentation, performance assessment, and lifecycle management, not just the existence of a reviewer. (U.S. Food and Drug Administration)

That shift matters because AI use across regulated life sciences is clearly rising. FDA says it has seen a significant increase in drug application submissions using AI components across nonclinical, clinical, postmarketing, and manufacturing phases. At the same time, FDA and EMA are both developing more explicit governance expectations, including FDA’s 2025 draft guidance for AI in regulatory decision-making, the January 2026 FDA-EMA-HMA guiding principles for AI in drug development, and EMA work plans that point to updates to Annex 11 and work on a new Annex 22 for artificial intelligence. (U.S. Food and Drug Administration)

Why the Phrase Sounds Better Than It Controls

The problem with “human in the loop” is that it often describes a role in a workflow without describing the quality of the control. It tells you that a human touched the process. It does not tell you whether the human was qualified, whether the task was reviewable, whether the output was traceable, whether the risk was proportionate, or whether the workflow was designed to catch meaningful failure modes.

That is exactly why regulators are not stopping at human oversight as a standalone concept. FDA’s 2025 draft guidance on AI for regulatory decision-making says that when the context of use involves a human in the loop, the evaluation methods should consider the performance of the human-AI team rather than just the model in isolation. That is a much higher bar than “someone reviewed it.” It means the workflow itself must be shown to work. (U.S. Food and Drug Administration)

This is the point many organizations miss. Oversight is not just a person. Oversight is a designed system of accountability.

What Regulators Are Actually Signaling

The current regulatory pattern is remarkably consistent.

FDA’s January 2025 draft guidance is built around a risk-based credibility assessment framework tied to a specific context of use. It does not treat AI outputs as inherently trustworthy because a human looked at them. It asks organizations to define what the model is being used for, assess model risk, and determine what credibility activities are needed to show the output is fit for that use. (U.S. Food and Drug Administration)

The January 2026 guiding principles from FDA, EMA, and HMA reinforce that direction. Their ten principles include human-centric design, a risk-based approach, adherence to standards, clear context of use, multidisciplinary expertise, data governance and documentation, model design and development practices, risk-based performance assessment, lifecycle management, and clear essential information. That list is telling. Human-centric design is one principle among many. It is not the whole operating model. (U.S. Food and Drug Administration)

EMA’s own current planning points the same way. The EMA Inspectors Working Group work plan published in 2026 references updated guidance for Annex 11 and Annex 22, and the EMA Quality Innovation Group work plan specifically points to work on both GMP Annex 11 on Computerised Systems and GMP Annex 22 Artificial Intelligence. The direction of travel is obvious: AI governance in GMP settings is becoming more explicit, more structured, and less tolerant of hand-wavy control claims. (European Medicines Agency (EMA))

The Warning Letter That Made the Point Plainly

The Purolea Cosmetics Lab warning letter made this issue unusually concrete. FDA did not say AI could never be used. It said that if AI is used as an aid in document creation, the firm must review the AI-generated documents to ensure they are accurate and actually compliant with CGMP. It also stated that any AI output or recommendation used in CGMP activities must be reviewed and cleared by an authorized human representative of the quality unit. But the broader letter shows why a late-stage review alone is not enough: FDA also cited failures involving microbiological testing, supplier qualification, incoming component testing, and process validation. The message was not “add a reviewer.” The message was “you still need a functioning quality system.” (U.S. Food and Drug Administration)

That distinction is crucial. If the surrounding workflow is weak, human review can turn into ceremonial signoff. It becomes a decorative shield placed in front of a broken process.

Why Human Review Often Fails as a Sole Control

There are several reasons a human-in-the-loop step can fail when it is used as the primary safeguard.

First, the reviewer may not have the right expertise for the specific task. A generic approval step is weak protection when the underlying content involves process validation logic, data integrity controls, or complex risk assumptions.

Second, the reviewer may not have the right information. If prompts, source context, limitations, linked requirements, and supporting evidence are not visible, the reviewer is often checking polish rather than substance.

Third, the task may not be realistically reviewable at scale. A workflow that produces large volumes of AI-generated requirements, tests, or procedural text can quickly outpace meaningful human scrutiny. The human stays in the loop, but the loop becomes too fast and too crowded to function as a real control.

Fourth, the workflow may not be designed to assess team performance. FDA’s own guidance is explicit that when humans and AI work together, the performance of the human-AI team should be evaluated. A reviewer who catches nothing because the interface hides uncertainty, context, or version history is not proof of safety. It is proof that the review design is weak. (U.S. Food and Drug Administration)

What “Enough” Actually Looks Like

If human presence is not enough, what is?

In GxP AI, a stronger control model usually includes at least six elements.

It starts with a clear context of use. Teams need to define exactly what the AI is doing. Is it drafting requirements, proposing risk answers, classifying deviations, generating test ideas, summarizing records, or making a recommendation that could affect quality decisions? FDA keeps returning to context of use because vague purpose statements produce vague controls. (U.S. Food and Drug Administration)

Then comes a risk-based approach. Higher-impact uses need stronger credibility activities, stricter review, better documentation, and more robust testing. This is now a central theme in both FDA’s draft guidance and the 2026 guiding principles. (U.S. Food and Drug Administration)

Then comes multidisciplinary ownership. FDA, EMA, and HMA explicitly call for multidisciplinary expertise. In practice, that means AI governance cannot sit with IT alone or quality alone. It needs input from process owners, validation, QA, data or technical specialists, and sometimes regulatory or clinical subject-matter experts depending on use. (U.S. Food and Drug Administration)

Then comes documentation and data governance. The 2026 guiding principles explicitly call out data governance and documentation. That is not clerical garnish. It is what makes the workflow explainable later, when someone asks how the output was produced, reviewed, accepted, and controlled. (U.S. Food and Drug Administration)

Then comes performance assessment. Not just model performance, but workflow performance. Does the human-AI pairing actually improve outcomes? Does the review step catch the right kinds of errors? Does the system behave reliably in the intended use environment? FDA’s guidance is clear that performance has to be evaluated in context. (U.S. Food and Drug Administration)

Finally, there is lifecycle management. Models drift, workflows change, prompts evolve, interfaces get redesigned, and operating contexts shift. The 2026 principles explicitly include lifecycle management because one-time signoff is not enough for a moving system. (U.S. Food and Drug Administration)

Why This Is a Validation Design Problem, Not Just a Policy Problem

A lot of organizations respond to GxP AI risk by writing a policy that says outputs must be reviewed by a human. That is a start, but it does not solve the harder problem.

The harder problem is designing validation and quality workflows so that human review is meaningful.

That means reviewers need the right context. It means AI-assisted artifacts need to be connected to requirements, risks, tests, deviations, evidence, and approvals. It means systems need traceability, auditability, and version control. It means the workflow needs to preserve who reviewed what, when, under which role, and against which standard. It means the quality unit is not parachuted in at the end like a reluctant theater critic.

This is where many teams begin to discover that AI governance is also an infrastructure question. The more AI participates in regulated work, the more the surrounding environment must support controlled creation, review, approval, revision, and ongoing oversight.

The Industry Is Moving Past Minimal Oversight Language

The broader market is already showing signs of this shift.

FDA is not retreating from AI. It is building frameworks for credibility and openly discussing AI use across the drug lifecycle. FDA, EMA, and HMA are aligning on principles that are broader than human oversight alone. EMA is moving toward more explicit AI-related GMP guidance. Even in adjacent FDA materials on medical devices and AI, the same logic appears: where there is a human in the loop, human factors and the performance of the human-AI team matter, not just the model’s standalone output. (U.S. Food and Drug Administration)

That should change how life sciences teams talk about readiness. “We keep a human in the loop” is becoming the beginning of the answer, not the answer itself.

What Life Sciences Teams Should Ask Instead

A better set of questions would look like this:

Can we define the AI’s context of use clearly?

Can we show why the level of control is proportionate to the risk?

Can we demonstrate that the human reviewer is qualified and equipped to perform the review meaningfully?

Can we evaluate the performance of the human-AI workflow, not just the model alone?

Can we trace AI-assisted outputs through approval, evidence, change, and ongoing governance?

Can we maintain that control as the model, workflow, and environment evolve?

Those questions are more demanding, but they are also more useful. They turn oversight from a slogan into an operating discipline.

Conclusion

Human in the loop still matters. In many GxP AI workflows, it will remain a necessary part of control.

But it is not enough.

Regulators are making that increasingly clear. FDA’s draft guidance emphasizes context of use and the performance of the human-AI team. The 2026 FDA-EMA-HMA principles emphasize risk, standards, documentation, expertise, performance, and lifecycle management alongside human-centric design. EMA’s work plans show that more explicit AI governance is coming into GMP guidance. The common thread is simple: a human reviewer is one ingredient in a defensible system, not a substitute for one. (U.S. Food and Drug Administration)

In regulated environments, the right question is not whether a human touched the workflow.

It is whether the workflow itself was designed to deserve trust.

Visual representing software validation processes

Computerized System Validation: What It Is and How to Validate a System

Computerized system validation is the backbone of safe,..

Data Integrity in Pharmaceutical Industry

Understanding Data Integrity in the Pharmaceutical Industry

Data Integrity Policy for Pharmaceutical Industry is a set..

Visual representing data integrity and compliance

The Importance of ALCOA Principles in Pharma

ALCOA principles are the five pillars, Attributable, Legible, Contemporaneous,..

Enter your email to get the Handbook

Learn about the industry

Get tailored templates

Discover Validfor

Before you go...

Verify your e-mail

We will send you the link for the free “21 CFR Part 11 Readiness Checker ” test to your email address. Please enter a valid email address.

Verify your e-mail

We will send you the link for the free “Annex 11 Readiness Checker” test to your email address. Please enter a valid email address.

You’re all set!

We’ll reach out shortly to schedule a time