Hot take: private endpoints are boring. DNS is the drama. A private endpoint is just a private IP address behind a name. If the name resolves incorrectly, everything breaks, and the ticket lands on “the network team” by default.
Most private endpoint failures I’ve seen come down to the same root cause: nobody can answer, in one sentence, who owns DNS for this service and this client path.
Operator rule: if you can’t point to an owner for zone, forwarding, and lifecycle, you don’t have a private endpoint design yet. You have a hope-and-pray deployment. |
What “private endpoint is down” usually means
When someone says “private endpoint is failing,” the symptom is almost always one of these:
· The hostname returns NXDOMAIN (no record).
· The hostname resolves to the public IP, so the client hits the public endpoint and gets blocked by firewall rules.
· The hostname resolves to an old private IP (stale record, wrong zone, or cached answer).
· Different clients resolve different answers, depending on which DNS server they ask (split brain).
Notice what’s missing: the private endpoint itself. The endpoint does what it does. The name is where things go sideways.
The real failure mode: “DNS ownership” is a blank box
In most enterprises, private endpoints cross at least three teams: platform, network, and the app or service owner. Each team is doing something reasonable in their own lane. The trouble is the handoff points.
Here are the common “blank box” questions that create outages:
· Who owns the Private DNS zone (or the equivalent split-brain zone on-prem)?
· Who owns the VNet links to that zone, including hub-spoke patterns and shared services VNets?
· Who owns conditional forwarding from on-prem DNS to Azure, and who tests it after changes?
· Who owns the resolver chain inside Azure (custom DNS VMs, firewall DNS proxy, Azure DNS Private Resolver, or other)?
· Who owns the record lifecycle when endpoints are recreated, renamed, or moved?
If those answers are fuzzy, you can deploy private endpoints all day and still get random breakage. DNS will keep routing around your org chart.

Make ownership concrete: the 4-layer DNS model
To stop the churn, treat DNS like part of the private endpoint product. I use a simple model with four layers. Each layer needs an explicit owner.
Layer | What it is | What can go wrong | Owner (pick one team) |
1) Name | The FQDN the client uses (for example: <service>.vault.azure.net). | App uses the wrong hostname, or multiple names point to different endpoints. | Service/App owner |
2) Zone | Where the private A record lives (Azure Private DNS zone or on-prem split zone). | No record, wrong record, duplicate zones, stale records after redeploy. | Platform team/DNS team |
3) Resolver path | How queries travel from client to the authoritative answer (forwarders, conditional rules, DNS proxy). | Queries never reach Azure, forwarding points to the wrong place, recursion blocked. | Network/DNS team |
4) Client configuration | Which DNS server the client actually uses (DHCP options, static settings, VPN, dev boxes). | Some clients bypass the intended path and get public answers. | Network/Desktop engineering |
You can argue about the exact team names, but the pattern holds: if nobody owns a layer, the layer becomes tribal knowledge. That’s when private endpoints “mysteriously” fail after unrelated changes.
A practical RACI that actually reduces tickets
Here’s a baseline RACI you can steal. Adjust roles to match your org, but keep the ownership unambiguous.
Task | App owner | Platform/DNS | Network/DNS | Security |
Create private endpoint resource | C | R/A | I | C |
Create or attach Private DNS zone | I | R/A | C | C |
Link zone to VNets (hub/spoke) | I | R | R/A | C |
Configure conditional forwarding (on-prem → Azure) | I | C | R/A | I |
Own resolver chain in Azure (DNS proxy/resolver) | I | C | R/A | C |
Validation tests from each network segment | R | R | R | C |
Record lifecycle (redeploy, rename, delete) | C | R/A | C | I |
Monitoring, alerting, incident response | C | R | R | R/A |
If you adopt only one thing from this post, adopt this: every private endpoint request should include the RACI and the resolver path diagram. If those aren’t present, the request is incomplete.
The 15-minute triage checklist
When someone pings you with “private endpoint is broken,” run this quick loop before anyone starts changing things:
1. Confirm the exact FQDN the client is using. Copy it from the app config or connection string.
2. From the failing client, run a lookup (nslookup or dig) and record: DNS server queried, answer returned, and TTL.
3. Identify the client’s DNS path: which resolver does it actually hit (VPN, VNet DNS setting, DHCP, static).
4. Check the authoritative zone: does the private A record exist, and is it the expected private IP.
5. Verify zone linkage or split-zone distribution: is the zone linked to the right VNets, and do on-prem resolvers have the right conditional rule.
6. Confirm the network path to the private IP (route, NSG, firewall) only after DNS is proven correct.

That order matters. Most teams start with routing or firewall rules because those feel concrete. But if DNS is wrong, you’re debugging the wrong destination.
Common enterprise patterns that create split-brain DNS
If you’re in a hub-spoke or hybrid environment, these show up constantly:
· Multiple private DNS zones for the same suffix across subscriptions or tenants, each with different records.
· Zone exists, but it is linked only to the spoke VNet, not the hub or shared services VNet where your resolvers live.
· On-prem conditional forwarding points to a DNS forwarder that can’t resolve Azure private zones (or can’t reach Azure).
· A “temporary” DNS override or hosts file entry becomes permanent and outlives the endpoint it pointed to.
· Caching hides changes. You fix the record, but clients keep the old answer until TTL expires or caches are cleared.
None of these are exotic. They’re what happens when DNS is treated like plumbing instead of part of the platform contract.
Prevention: treat DNS like a product, not a one-time task
Here’s what “good” looks like operationally:
· A single documented DNS design per landing zone: which zones, which resolvers, which forwarding rules.
· A request form that captures: service, FQDN, client locations, expected private IP range, and the DNS owner.
· A standard validation script is run from each network segment before go-live and after any DNS change.
· A lifecycle rule: when a private endpoint is deleted or recreated, the DNS record updates are owned and tracked.
· Monitoring that alerts when a critical name resolves to public, resolves to NXDOMAIN, or resolves to an unexpected private IP.
Private endpoints work best when they’re boring. If you want boring, make DNS ownership explicit, testable, and repeatable.
Closing thought
When a private endpoint fails, the fix is rarely “turn the knob harder.” It’s usually a conversation that should have happened up front.