Hot take: "Private endpoints fail for one reason: DNS ownership is unclear."

If you have ever stared at a Private Endpoint that shows "Approved" and "Succeeded" while your app times out, you already know the punchline. The network path is rarely the first problem. Name resolution is.

A Private Endpoint is a private IP on a NIC. Everything that follows depends on one thing: the client must resolve the service FQDN to that private IP, from the network where the client sits. When that ownership is fuzzy, teams chase NSGs, routes, certificates, firewalls, and "Azure is down" theories for hours.

TL;DR

·        Treat Private Endpoint troubleshooting as a DNS problem first, a networking problem second.

·        From the same client that is failing, prove what DNS server it uses and what IP it gets back.

·        If the answer is public, you are not on the private path. If the answer is NXDOMAIN, the private zone is missing or not reachable.

·        Make one team the owner of Private DNS zones and VNet links. Without that, you will keep re-learning the same outage.

The mental model (30 seconds)

Private Link does not change your application. It changes where the name points.

·        Your app calls a normal public FQDN (for example, a Key Vault, Storage account, SQL server, ACR, or web app).

·        DNS is expected to return a private IP in your VNet for that FQDN (split-horizon DNS).

·        That private IP belongs to the Private Endpoint NIC in a specific subnet.

·        If DNS returns a public IP, traffic goes to the public endpoint. If your firewall blocks it, you see timeouts. If it is allowed, you might not notice until data exfiltration or policy findings show up.

So the first question is always: "What IP does this name resolve to from the failing client?"

Private Endpoint DNS checklist (in order)

Run these checks in sequence. Do not skip ahead. Each step is meant to eliminate a full class of failure.

1.      Identify the exact name the client is using

·        Write down the FQDN the client connects to (not what you think it uses). Then capture:

·        Service type and sub-resource (blob vs file vs vault vs sqlServer, etc.)

·        Environment and region

·        From where the client runs (VNet, subnet, on-prem site)

Windows:
  Resolve-DnsName <fqdn>
  nslookup <fqdn>

Linux/macOS:
  dig +short <fqdn>
  nslookup <fqdn>

 

2.      Confirm the Private Endpoint is actually connected

·        In Azure, verify the Private Endpoint:

·        Connection state is Approved

·        Provisioning state is Succeeded

·        The NIC exists and has a private IP

·        You are looking at the correct subscription and resource group

Azure CLI (example):
  az network private-endpoint show -g <rg> -n <peName> --query "{state:provisioningState, ip:customDnsConfigs[0].ipAddresses[0], subnet:subnet.id}" -o jsonc

Portal:
  Private Endpoint -> DNS configuration (capture the private IP)
  Private Endpoint -> Network Interface (verify NIC and IP)

 

3.      Prove whichDNS server the client is using

·        If the client uses a custom DNS server, Azure Private DNS will not help unless that server can resolve the privatelink zones.

Windows:
  ipconfig /all
  Get-DnsClientServerAddress

Linux:
  cat /etc/resolv.conf
  resolvectl status  (systemd)

Question to answer:
  "Which DNS server answered the query?"

 

4.      From the failing client, resolve the FQDN and classify the answer

·        This is the fork in the road. The response usually lands in one of three buckets:

·        Public IP address returned: split-horizon is not happening for this client

·        Private IP address returned (RFC1918): DNS is likely correct, move to network path validation

·        NXDOMAIN or SERVFAIL: the zone is missing, not linked, or not reachable through your DNS chain

Expected:
  <fqdn> -> <private IP of the Private Endpoint NIC>

If you see a public IP:
  You are on the public path, even if a Private Endpoint exists.

 

5.      Validate the Private DNS zone exists for the service

·        Most Azure services use a zone that starts with privatelink. The exact zone name depends on the service.

·        What to verify:

·        A Private DNS zone exists for the service (for example, privatelink.vaultcore.azure.net for Key Vault).

·        The zone contains an A record for your resource name.

·        If you use multiple private endpoints or regions, confirm the record set is correct for the specific resource.

Azure CLI (example):
  az network private-dns zone list -g <dnsRg> -o table
  az network private-dns record-set a list -g <dnsRg> -z <privatelink-zone> -o table

Portal:
  Private DNS zones -> <zone> -> Record sets

 

6.      Confirm the Private DNS zone is linked to the right VNet(s)

·        A correct zone with a correct record still fails if the VNet link is missing. In hub-and-spoke, this is where most outages live.

·        What to verify:

·        The zone has a virtual network link to the VNet where the client lives (not just the VNet where the Private Endpoint lives).

·        If you centralize DNS in a hub VNet, the spokes still need a DNS resolution path to that zone (via Azure-provided DNS, Private Resolver, or your own forwarding).

·        Auto-registration is typically not required for Private Endpoint A records. Record creation is a separate concern from VNet linking.

Azure CLI (example):
  az network private-dns link vnet list -g <dnsRg> -z <privatelink-zone> -o table

Sanity check:
  If the client is in Spoke-02, the zone must be reachable from Spoke-02.

 

7.      If you have custom DNS, validate the forwarding chain

·        Custom DNS is not wrong, but it turns this into an ownership problem. Private DNS zones do not magically appear in on-prem DNS.

·        Common working patterns:

·        Azure Private DNS + Azure DNS Private Resolver: on-prem (or custom DNS) forwards privatelink zones to the resolver inbound endpoint.

·        Azure firewall or DNS proxy in hub: spokes use the hub DNS, and the hub can resolve privatelink zones via Azure Private DNS.

·        No custom DNS for Azure workloads: Azure VMs use Azure-provided DNS, and on-prem resolution is handled separately.

·        Red flags:

·        On-prem DNS conditional forwarder points to 168.63.129.16. That IP is not reachable from on-prem.

·        Forwarders exist for some privatelink zones, but not all. Teams add them one outage at a time.

·        Multiple DNS servers answer differently (split brain). The client hits whichever server it learns first.

Questions to answer:
  - Where does <fqdn> resolution happen for this client?
  - If the query leaves the client, which DNS server handles it next?
  - Where do privatelink.* queries get forwarded?

Tip:
  Capture packets or enable DNS query logging on your DNS server if the path is unclear.

 

8.      Confirm the Private Endpoint DNS zone group (the part IaC often misses)

·        If you create private endpoints with IaC, make sure you also attach the Private DNS zone group. Without it, the zone might exist, but the A record will not.

Portal:
  Private Endpoint -> DNS configuration -> Private DNS zone group

Azure CLI (example):
  az network private-endpoint dns-zone-group list -g <rg> --endpoint-name <peName> -o table

 

9.      Flush caches only after the records are correct

·        DNS caching can keep a bad answer alive longer than you think. Fix the source of truth first, then flush.

Windows:
  ipconfig /flushdns
  Clear-DnsClientCache

Linux:
  sudo systemd-resolve --flush-caches  (or restart systemd-resolved)

Browser/app:
  Some apps cache DNS in-process. Restart the app or the host.

 

10.   If DNS is correct, validate the network path quickly

·        Once the FQDN resolves to the Private Endpoint IP, failures are usually NSG, UDR, firewall, or proxy. Keep it fast:

Windows:
  Test-NetConnection <privateIP> -Port 443
  tracert <privateIP>

Linux:
  nc -vz <privateIP> 443
  traceroute <privateIP>

Azure:
  Use Network Watcher Connection troubleshoot if you have it enabled.

If you cannot connect to the private IP, stop blaming DNS. You have a routing or security problem.

Fast failure signatures (what the symptom usually means)

These patterns show up constantly. When you see one, you can usually skip half the debate.

Symptom

Most likely cause

Next check

Resolves to a public IP

Private DNS zone not used by that client, or no split-horizon path from client DNS

Check client DNS server, conditional forwarders, and VNet links

NXDOMAIN / SERVFAIL

Zone missing, not linked, or forwarders not configured

Verify Private DNS zone exists and is reachable from the client DNS chain

Resolves to private IP but connection times out

NSG, UDR, firewall, proxy, or wrong subnet reachability

Test-NetConnection to the private IP and review UDR/NSG on client subnet

Works in Azure VNet but not on-prem

On-prem cannot resolve privatelink zones, or forwarding stops at the edge

Use Azure Private Resolver inbound and validate conditional forwarding

Intermittent: sometimes private, sometimes public

Multiple DNS resolvers or caches with inconsistent records

Trace which DNS server answered and standardize ownership/source of truth

DNS ownership: the part that prevents repeat incidents

Most organizations treat Private Endpoints as a platform feature and DNS as a networking feature. That split is fine, until no one owns the glue.

·        If you want fewer outages, pick an owner and make it explicit:

·        Who creates and maintains Private DNS zones for Private Link services?

·        Who owns VNet links (hub, spokes, and shared services VNets)?

·        Who approves new privatelink zones and conditional forwarders in custom DNS?

·        Who validates the end-to-end path during onboarding (before production traffic relies on it)?

A simple RACI template you can copy:

Activity

Platform team

Network/DNS team

Security

App team

Create Private Endpoint

R

C

C

C

Create Private DNS zone (privatelink.*)

A/R

C

C

I

Link zone to VNets

A/R

C

C

I

Conditional forwarders / resolver inbound endpoints

C

A/R

C

I

Validation tests from each network

A/R

C

C

R

Change control for DNS records/links

A/R

A/R

C

I

Copy/paste runbook snippet (keep it with the service onboarding)

If you build paved roads, make DNS validation a required gate. This is a short block you can drop into a runbook or change ticket.

Private Endpoint DNS validation (required)

1) From the target client network, resolve the service FQDN:
   - Expected result: private IP of the Private Endpoint NIC
   - Record the DNS server that answered

2) If the answer is public or NXDOMAIN:
   - Stop. Fix DNS before testing connectivity.

3) If the answer is private, test port 443 to the private IP:
   - If it fails: troubleshoot NSG/UDR/firewall
   - If it passes: proceed with application testing

Evidence to capture:
- Resolve-DnsName output (or dig output)
- Private Endpoint NIC private IP
- Private DNS zone + record set (screenshot or CLI output)
- VNet link list for the privatelink zone

Closing thought

Private Endpoints are reliable when DNS is boring. If you want it boring, treat DNS as part of the platform, not an afterthought. Make ownership clear, bake validation into onboarding, and keep the checklist close.

Keep reading