Hot take: "Private endpoints fail for one reason: DNS ownership is unclear." |
If you have ever stared at a Private Endpoint that shows "Approved" and "Succeeded" while your app times out, you already know the punchline. The network path is rarely the first problem. Name resolution is.
A Private Endpoint is a private IP on a NIC. Everything that follows depends on one thing: the client must resolve the service FQDN to that private IP, from the network where the client sits. When that ownership is fuzzy, teams chase NSGs, routes, certificates, firewalls, and "Azure is down" theories for hours.
TL;DR
· Treat Private Endpoint troubleshooting as a DNS problem first, a networking problem second.
· From the same client that is failing, prove what DNS server it uses and what IP it gets back.
· If the answer is public, you are not on the private path. If the answer is NXDOMAIN, the private zone is missing or not reachable.
· Make one team the owner of Private DNS zones and VNet links. Without that, you will keep re-learning the same outage.
The mental model (30 seconds)
Private Link does not change your application. It changes where the name points.
· Your app calls a normal public FQDN (for example, a Key Vault, Storage account, SQL server, ACR, or web app).
· DNS is expected to return a private IP in your VNet for that FQDN (split-horizon DNS).
· That private IP belongs to the Private Endpoint NIC in a specific subnet.
· If DNS returns a public IP, traffic goes to the public endpoint. If your firewall blocks it, you see timeouts. If it is allowed, you might not notice until data exfiltration or policy findings show up.
So the first question is always: "What IP does this name resolve to from the failing client?"
Private Endpoint DNS checklist (in order)
Run these checks in sequence. Do not skip ahead. Each step is meant to eliminate a full class of failure.
1. Identify the exact name the client is using
· Write down the FQDN the client connects to (not what you think it uses). Then capture:
· Service type and sub-resource (blob vs file vs vault vs sqlServer, etc.)
· Environment and region
· From where the client runs (VNet, subnet, on-prem site)
Windows: |
2. Confirm the Private Endpoint is actually connected
· In Azure, verify the Private Endpoint:
· Connection state is Approved
· Provisioning state is Succeeded
· The NIC exists and has a private IP
· You are looking at the correct subscription and resource group
Azure CLI (example): |
3. Prove whichDNS server the client is using
· If the client uses a custom DNS server, Azure Private DNS will not help unless that server can resolve the privatelink zones.
Windows: |
4. From the failing client, resolve the FQDN and classify the answer
· This is the fork in the road. The response usually lands in one of three buckets:
· Public IP address returned: split-horizon is not happening for this client
· Private IP address returned (RFC1918): DNS is likely correct, move to network path validation
· NXDOMAIN or SERVFAIL: the zone is missing, not linked, or not reachable through your DNS chain
Expected: |
5. Validate the Private DNS zone exists for the service
· Most Azure services use a zone that starts with privatelink. The exact zone name depends on the service.
· What to verify:
· A Private DNS zone exists for the service (for example, privatelink.vaultcore.azure.net for Key Vault).
· The zone contains an A record for your resource name.
· If you use multiple private endpoints or regions, confirm the record set is correct for the specific resource.
Azure CLI (example): |
6. Confirm the Private DNS zone is linked to the right VNet(s)
· A correct zone with a correct record still fails if the VNet link is missing. In hub-and-spoke, this is where most outages live.
· What to verify:
· The zone has a virtual network link to the VNet where the client lives (not just the VNet where the Private Endpoint lives).
· If you centralize DNS in a hub VNet, the spokes still need a DNS resolution path to that zone (via Azure-provided DNS, Private Resolver, or your own forwarding).
· Auto-registration is typically not required for Private Endpoint A records. Record creation is a separate concern from VNet linking.
Azure CLI (example): |
7. If you have custom DNS, validate the forwarding chain
· Custom DNS is not wrong, but it turns this into an ownership problem. Private DNS zones do not magically appear in on-prem DNS.
· Common working patterns:
· Azure Private DNS + Azure DNS Private Resolver: on-prem (or custom DNS) forwards privatelink zones to the resolver inbound endpoint.
· Azure firewall or DNS proxy in hub: spokes use the hub DNS, and the hub can resolve privatelink zones via Azure Private DNS.
· No custom DNS for Azure workloads: Azure VMs use Azure-provided DNS, and on-prem resolution is handled separately.
· Red flags:
· On-prem DNS conditional forwarder points to 168.63.129.16. That IP is not reachable from on-prem.
· Forwarders exist for some privatelink zones, but not all. Teams add them one outage at a time.
· Multiple DNS servers answer differently (split brain). The client hits whichever server it learns first.
Questions to answer: |
8. Confirm the Private Endpoint DNS zone group (the part IaC often misses)
· If you create private endpoints with IaC, make sure you also attach the Private DNS zone group. Without it, the zone might exist, but the A record will not.
Portal: |
9. Flush caches only after the records are correct
· DNS caching can keep a bad answer alive longer than you think. Fix the source of truth first, then flush.
Windows: |
10. If DNS is correct, validate the network path quickly
· Once the FQDN resolves to the Private Endpoint IP, failures are usually NSG, UDR, firewall, or proxy. Keep it fast:
Windows: |
If you cannot connect to the private IP, stop blaming DNS. You have a routing or security problem.
Fast failure signatures (what the symptom usually means)
These patterns show up constantly. When you see one, you can usually skip half the debate.
Symptom | Most likely cause | Next check |
Resolves to a public IP | Private DNS zone not used by that client, or no split-horizon path from client DNS | Check client DNS server, conditional forwarders, and VNet links |
NXDOMAIN / SERVFAIL | Zone missing, not linked, or forwarders not configured | Verify Private DNS zone exists and is reachable from the client DNS chain |
Resolves to private IP but connection times out | NSG, UDR, firewall, proxy, or wrong subnet reachability | Test-NetConnection to the private IP and review UDR/NSG on client subnet |
Works in Azure VNet but not on-prem | On-prem cannot resolve privatelink zones, or forwarding stops at the edge | Use Azure Private Resolver inbound and validate conditional forwarding |
Intermittent: sometimes private, sometimes public | Multiple DNS resolvers or caches with inconsistent records | Trace which DNS server answered and standardize ownership/source of truth |
DNS ownership: the part that prevents repeat incidents
Most organizations treat Private Endpoints as a platform feature and DNS as a networking feature. That split is fine, until no one owns the glue.
· If you want fewer outages, pick an owner and make it explicit:
· Who creates and maintains Private DNS zones for Private Link services?
· Who owns VNet links (hub, spokes, and shared services VNets)?
· Who approves new privatelink zones and conditional forwarders in custom DNS?
· Who validates the end-to-end path during onboarding (before production traffic relies on it)?
A simple RACI template you can copy:
Activity | Platform team | Network/DNS team | Security | App team |
Create Private Endpoint | R | C | C | C |
Create Private DNS zone (privatelink.*) | A/R | C | C | I |
Link zone to VNets | A/R | C | C | I |
Conditional forwarders / resolver inbound endpoints | C | A/R | C | I |
Validation tests from each network | A/R | C | C | R |
Change control for DNS records/links | A/R | A/R | C | I |
Copy/paste runbook snippet (keep it with the service onboarding)
If you build paved roads, make DNS validation a required gate. This is a short block you can drop into a runbook or change ticket.
Private Endpoint DNS validation (required) |
Closing thought
Private Endpoints are reliable when DNS is boring. If you want it boring, treat DNS as part of the platform, not an afterthought. Make ownership clear, bake validation into onboarding, and keep the checklist close.
