If you’ve ever had to answer, “Who changed that Azure Policy… and why?” you already know the problem with portal-driven governance:

  • Changes happen fast (usually under pressure).

  • The “why” gets lost (or lives in someone’s head).

  • Reproducing the same governance across tenants becomes a manual grind.

  • Drift quietly accumulates until an audit—or an incident—forces the cleanup.

This year, I decided to treat Azure governance like engineering, not administration.

I migrated a manual Azure Policy “clickops” deployments into a fully traceable, PR-driven CI/CD model using GitHub and GitHub Actions, and I refactored and successfully migrated 250+ custom policy definitions and their assignments in a single day. The repo now contains 267 policy definition JSON files (excluding assignments, exemptions, parameters, and tenant metadata), deployed consistently across a half dozen Microsoft Azure Entra tenants.

This is the story of what changed, how it works, and why it matters.

The real issue wasn’t policy. It was delivery.

Azure Policy is powerful. But most organizations implement it like this:

  1. Someone exports a JSON.

  2. Someone else edits it.

  3. Someone applies it in the portal.

  4. Weeks later, nobody can explain the difference between “intended state” and “what’s actually running.”

That approach breaks down the moment you have:

  • multiple management groups,

  • multiple environments,

  • multiple tenants,

  • and real expectations around auditability and change control.

My goal captured the intent perfectly:

Policy-as-Code Meets Infra-as-Code: Automated. Accurate. Auditable

So I focused on one outcome: make governance changes behave like product changes; reviewed, tested, traceable, and repeatable.

The target state: governance with an audit trail

The end state I wanted was simple:

Commit → PR → CI validation → controlled deployment → verified enforcement

Not “someone clicked apply.”

That meant building a real operating model:

  • a standardized repo structure that can scale across tenants

  • automation that limits blast radius by deploying only what changed

  • safe execution modes (including dry-run)

  • guardrails to prevent accidental assignment sprawl

The repo model: multi-tenant, category-driven, repeatable

At the core is a multi-tenant repository structure designed to be boring (in the best way):

  • Tenant-scoped layout (e.g., tenants/<tenant>/...)

  • Custom categories (e.g., tenants/<tenant>/custom/<category>/...)

  • Tenant metadata (e.g., tenant.json) to drive deterministic deployments

That sounds like “just folders,” but it solves a huge problem: it turns “tribal knowledge” into discoverable structure. New people can navigate governance without needing a guided tour.

More importantly, the repo becomes the single source of truth:

  • policy definitions live as code

  • assignment intent is represented as exported metadata

  • changes become reviewable artifacts

The pipeline: plan + deploy, with blast-radius control

The CI/CD flow is intentionally split into two modes of thinking:

1) Plan: detect what changed, decide what to touch

On each push to main, the workflow calculates the change set using git diff, then builds a per-tenant deployment matrix. The system deploys only what changed, which makes governance updates feel like safe releases instead of “big bang” events.

That “changed-only deploy” capability was one of the biggest practical wins:

  • smaller deployments,

  • fewer surprises,

  • faster turnaround,

  • and significantly lower operational risk.

2) Deploy: execute safely, with controls

The same pipeline also supports controlled manual runs with workflow inputs like:

  • dryRun (default: true)
    Lets you validate behavior and output without actually applying changes.

  • deployAll (default: false)
    Enables a full resync when you need it (rare, but necessary during migrations or large refactors).

This gave me a safe default posture: validate first, deploy intentionally.

Guardrails that prevent “oops governance”

Automation without guardrails just scales mistakes faster. So I added explicit controls that prevent the most common failure mode: accidental assignment creation or reassignment.

Two defaults matter a lot:

  • allowCreate = false
    Assignments won’t be created unless explicitly allowed.

  • requirePolicyMatch = true
    Prevents assignments from silently “pointing” at the wrong policy definition.

Those two switches reduce the chance of a casual change turning into a broad governance event. That’s the difference between “CI/CD exists” and “CI/CD is safe.”

Secure automation: per-tenant auth without long-lived secrets

Because this operates across multiple Entra tenants, authentication had to be repeatable and secure.

So I implemented tenant-specific authentication using Azure OIDC (federated credentials). The goal was clear:

  • no long-lived secrets

  • a modern identity model

  • permission scope that aligns to the deployment target (management group governance)

This keeps the security lifecycle tied to the workflow identity model, not a pile of stored credentials.

The migration day: 250+ policies moved from portal history to git history

Here’s the headline:

I successfully refactored and migrated 250+ custom Azure Policy definitions and their assignments in a single day, and validated that:

  • the definitions land correctly at the management group scope

  • the assignments apply as intended

  • enforcement behavior matches expectations

  • the process is repeatable across tenants

The repo now holds 267 policy definition JSON files (excluding assignments, exemptions, parameters, and tenant metadata).

That’s not just “more policies.” It’s a governance delivery engine that can keep shipping reliably.

The engineering work that made it real (not theoretical)

This wasn’t “write a workflow YAML and call it done.”

Stabilizing the system required real-world fixes—exactly the kind you only discover when you run pipelines against production-scale artifacts:

  • path normalization quirks (especially cross-platform runners)

  • strict scripting behavior that breaks under edge cases

  • literal-path file operations

  • normalization issues in fields like nonComplianceMessages

  • deterministic assignment update behavior

Those fixes matter because they turn a nice-looking prototype into something you can trust at scale.

Why leadership should care (the value add)

This wasn’t built to be “cool automation.” It was built to remove risk and toil.

Operational excellence: less toil, faster change

Governance updates are now:

  • repeatable

  • smaller in scope

  • faster to ship

  • less dependent on specific people knowing “the portal steps”

Auditability: every change is traceable

Instead of “we think this was applied last month,” we now have:

  • commit history

  • PR context

  • and workflow runs as evidence

That’s a material improvement in change control and audit readiness.

Reduced drift: intent stays aligned with reality

When policy state lives in a repo and deployments are deterministic, drift has nowhere to hide. Governance becomes consistent across tenants and management groups by default.

Scales beyond the original target

The original goal included “10 Azure Policies delivered as code and validated in CI.”

This work didn’t just exceed that number. It created the mechanism that makes delivering hundreds of policies normal—and makes adding the next hundred routine.

What I’d recommend if you want to replicate this model

If you’re trying to move from ClickOps to Policy-as-Code, focus on these principles first:

  1. Make the repo structure obvious
    If people can’t find things, they’ll bypass the process.

  2. Deploy only what changed
    It’s the easiest way to reduce blast radius and speed up releases.

  3. Ship with a safe default
    Dry-run should be the natural first step, not an afterthought.

  4. Protect assignments like production infrastructure
    Put guardrails in place so CI/CD can’t accidentally expand scope.

  5. Treat multi-tenancy like a first-class requirement
    If it works in one tenant but can’t scale, it’s still manual—just slower.

Closing thoughts: governance should feel like engineering

This project changed one fundamental thing: Governance is now an engineering system.

Policies move through a disciplined lifecycle—planned, reviewed, deployed, and traceable across multiple tenants, with safety controls built in. That’s what “Automated. Accurate. Auditable.” looks like in practice.

Want help moving Azure Policy out of the portal?
Use my Practical IT Intake form and include your scope (MG/sub/RG), tenant count, and any links/screenshots. I’ll point you to the safest next step. 👉 Practical IT Intake

Keep reading