Pilotcore Insights
Security & Compliance

The Role of AI and Machine Learning in Zero Trust Security

Where AI and ML actually fit inside a zero trust program, what to be skeptical of, and how to govern security AI under the NIST AI Risk Management Framework.

Pilotcore By Pilotcore 7 min read

Need Help With Security & Compliance?

Our experts can help you implement these strategies in your organisation. Get a free consultation today.

AI and ML, applied to a zero trust security program, are statistical signals that feed the access decision engine: behavioural baselines, anomaly scores, classifier outputs, and policy recommendations derived from observed traffic. They do not replace the policy or the human review, and they fail badly when treated as oracle.

For related context, see Zero Trust Assessment and AI and ML Pilot Project.

The Zero Trust architecture this builds on is defined in NIST SP 800-207. The trustworthy-AI principles below come from the NIST AI Risk Management Framework (AI RMF 1.0), and for AI/ML adversarial-risk reference, see NIST AI 100-2 E2025 on Adversarial Machine Learning (published March 2025, corrected April 2025; replaces the E2023 edition), which catalogues the attack classes any production security model needs to consider.

Where AI and ML actually fit in zero trust

Marketing language conflates “AI-powered” with “magic.” The truthful picture is narrower and more useful: AI and ML supply a small number of specific signal types into a zero trust decision pipeline that still depends on policy, identity, and logging built by humans.

In our engagements, these are often the most defensible starting points.

User and Entity Behaviour Analytics (UEBA). Baselines normal behaviour per user, per device, per service account, and produces a per-request risk score that the access decision engine consumes. The right output is “this access request is unusually risky for this account, step it up to stronger re-auth”, not “block.” Use a baseline window long enough to capture normal business cycles on your own traffic, keep the reference model current, and give analysts a feedback loop when a flagged event turns out to be benign.

Policy generation from observed traffic. Modern micro-segmentation and SaaS-access-management platforms use clustering on observed traffic to suggest policy that humans then approve. This can compress policy-authoring work where telemetry quality is strong and application owners are available for review. The model proposes; the human approves; the system enforces.

Phishing and credential-theft detection. Email-security and identity-protection tools have used supervised classifiers for years. Recent credential-theft and adversary-in-the-middle phishing campaigns have pushed security teams toward controls that evaluate browser, identity, session, and credential-submission signals closer to the point of entry. Treat in-browser detection as one control to evaluate alongside phishing-resistant MFA, identity telemetry, and conditional access policy.

What to be skeptical of

The phrase “AI-powered zero trust” without specifics should be treated as a marketing claim, not a product capability. Before approving a tool, ask:

  • What signal does the model produce, and where does the access decision engine consume it?
  • What is the training data, and how is drift detected?
  • What is the false positive rate on the customer’s own traffic, after the baseline period?
  • What is the explainability path when the analyst gets paged on a flagged event?
  • What is the override path when the model is wrong?

Tools that cannot answer those five questions are typically AI-flavoured, not AI-powered. They are usually fine as one signal among many; they are dangerous as the primary access decision input.

Governing security AI under NIST AI RMF

The NIST AI Risk Management Framework defines four core functions for AI governance: GOVERN, MAP, MEASURE, MANAGE. NIST states the framework’s purpose plainly: it is intended to “improve the ability to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems” (NIST AI Risk Management Framework). For a security AI program, those four functions map cleanly to operational concerns:

GOVERN. Name the accountable individual. Define the policy under which the model can make or influence access decisions, what it can never decide unilaterally, and the audit trail required for every model-driven decision. Cover the same governance ground as any other authorising boundary inside your zero trust program.

MAP. Document the deployment context: which decision the model feeds, which user populations are subject to it, which data it consumes, which adversarial risks apply. NIST AI 100-2 catalogues the relevant attack classes (evasion, poisoning, privacy, abuse); your map should list which apply to your model and what compensating controls are in place.

MEASURE. Track the false positive rate, the false negative rate, the analyst override rate, and the model’s prediction distribution over time. Drift in any of these is the early warning. A model that worked well at deployment and is now producing 3x the alerts is either telling you something real about the threat environment or it has degraded; either way, you need to know.

MANAGE. Have the playbook ready for model retraining, rollback to a previous version, and emergency disable. The hardest move in any AI program is shutting the model off when it is causing operational pain; the GOVERN-level decision rights must include the disable path.

A realistic adoption sequence

Three steps that consistently outperform “deploy AI everywhere”:

1. pick one decision the model will influence

Not “all access decisions.” One. The clearest first target is risk-based step-up authentication on SaaS access, because the cost of a wrong step-up (a single extra MFA prompt) is small and the signal is well-understood.

2. run in observe-only mode against your traffic for at least 60 days

Pipe the model’s outputs into the SIEM without affecting access. Tune until the false positive rate is acceptable to the analyst team. This is where most programs underspend; the temptation to go live too early creates the alert fatigue that kills program credibility within a quarter.

3. enforce on a tight cohort, measure, expand

Same sequence as any other zero trust control rollout. Start with a known user population, measure the analyst experience, then expand. Each expansion needs a new measurement window.

Failure modes that matter

Alert fatigue. As a back-of-envelope illustration, a model with a 1% false positive rate on 1M events per day would produce 10,000 false alerts per day. The analyst team disables the integration within weeks. Tune for precision before recall.

Model drift. The threat environment changes; the training data does not. A model trained on 2024 phishing patterns degrades against 2026 adversary-in-the-middle kits. Retraining schedule and drift detection are non-negotiable.

Opacity in audited environments. A regulated organisation may need to explain why a specific access decision was made or denied. A black-box classifier without an explainability layer is hard to defend in an audit, regardless of how accurate it is. Pick models and tools that support per-decision rationale extraction.

Single-source dependency. A program that uses one vendor’s model for the entire access decision is a single point of failure when the vendor’s classifier breaks, drifts, or is bypassed by a new attacker technique. Treat the AI signal as one input among many; keep the policy-based controls strong enough to stand without it.

Adversarial pressure. Production security models are attacked. NIST AI 100-2 catalogues the techniques, including data poisoning (corrupting the training set), evasion (crafting inputs that bypass the classifier), and model extraction. Threat-model your model the same way you threat-model the rest of the stack.

What this means for zero trust roadmaps

The CISA Zero Trust Maturity Model Version 2.0, April 2023 places the Identity, Devices, and Networks pillars on a four-level scale from Traditional to Optimal. CISA’s model describes more continuous, context-aware, and automated decisioning at higher maturity levels. AI and machine learning can help generate or interpret some of those signals, but AI is not itself a Zero Trust requirement.

A Zero Trust roadmap that takes AI and ML seriously will:

  • Pick the two or three decision surfaces where statistical signals genuinely improve the outcome.
  • Govern those models under NIST AI RMF.
  • Keep the human review path and the policy override path strong enough that an AI failure does not become a security incident.
  • Spend on observability and drift detection before spending on the next model.

That is a smaller and more practical use of AI than the marketing implies, and it produces durable improvement to the zero trust posture rather than a quarter of impressive demos.

How pilotcore helps

Pilotcore designs and rolls out zero trust programs for regulated Canadian organisations, including the AI and ML signal integration described above. Nelson Ford, principal at Pilotcore and based in Ottawa, is a CISSP and CMMC Certified Professional, and works with technical teams on use-case selection, NIST AI RMF-aligned governance, vendor evaluation, and the staged enforcement rollout that keeps the analyst team functioning. Book a zero trust readiness conversation.

Frequently asked questions

How do AI and ML fit into a zero trust program?

AI and ML supply statistical signals into the zero trust decision pipeline. Behavioural baselines, anomaly scores, classifier outputs, and policy recommendations derived from observed traffic. They do not replace the policy or the human review, and they fail badly when treated as an oracle.

What are the most useful AI use cases inside zero trust?

In our engagements, three use cases are often the most defensible starting points. User and Entity Behaviour Analytics (UEBA) feeds a per-request risk score into the access decision engine. Policy generation from observed traffic can compress policy-authoring work where telemetry quality is strong. Credential-theft controls can evaluate browser, identity, session, and credential-submission signals closer to the point of entry.

What is the NIST AI Risk Management Framework?

The NIST AI RMF defines four core functions for AI governance. GOVERN (accountability and policy), MAP (deployment context and adversarial risk), MEASURE (false positive rate, drift, override rate), and MANAGE (retraining, rollback, emergency disable). Security AI programs should govern under all four.

How do you evaluate an AI-powered zero trust vendor claim?

Ask five questions. What signal does the model produce, what is the training data and how is drift detected, what is the false positive rate on your traffic after the baseline period, what is the explainability path when an analyst gets paged, and what is the override path when the model is wrong. Tools that cannot answer those questions are AI-flavoured rather than AI-powered.

What are the failure modes of security AI?

Alert fatigue from poorly tuned false positives, model drift as the threat environment changes, opacity in audited environments where decisions must be explained, single-source dependency on one vendor's classifier, and adversarial pressure from data poisoning or evasion attacks catalogued in NIST AI 100-2.

Should AI replace human analysts in zero trust?

No. AI and ML should augment analyst decisions, not replace them. The right deployment pattern is to use the model as one input into the access decision pipeline with strong policy-based controls that stand alone if the model fails. Keep the human review and policy override paths strong enough to absorb an AI failure without it becoming a security incident.

Ready to Get Started?

Choose how you'd like to begin your journey with Pilotcore

Full Consultation

Discuss your complete cloud and security strategy with our experts. Perfect for comprehensive transformations and enterprise initiatives.

Popular Choice

Start with a Pilot

Test our expertise with a focused 1-4 week engagement. See real results before committing to larger initiatives.

View Pilot Projects →