GlobalProtect w/ Domain-based split-tunneling - how does this work, technically?

Pretty much the title. In a setup where domain-based split-tunneling is enabled, how does the device/OS determine for each packet on which interface (physical vs tunnel) to send it?

I understand the desired effect from a user perspective, but from a low-level standpoint I don’t see a way to achieve that (reliably).

Like others said, it’s based on GlobalProtect listening to DNS requests at the OS level and comparing them to the config to see if traffic to that domain should go through the tunnel or not.

The part I don’t see in replies above (sorry if I missed it) is that the mechanism used to force an IP one way or another is a Windows/MacOS interface/route “bind”. Not sure what API that is, but I think it’s part of OS routing where the OS can bind an IP/route to a particular NIC. I bet there’s some Powershell or bash command that does it, but I’ve never looked. For example, GP sees a DNS request for “Microsoft.com”, sees that this domain is excluded from the tunnel, and “binds” the returned IP to your home network physical adapter.

Each of these IP/NIC binds is given a timeout value within GlobalProtect, I think 60 or 120 seconds, and if a DNS request for that domain isn’t seen in that time, the bind will expire and be deleted. That’s the big problem we ran into: some websites we wanted to exclude are written completely in JavaScript or other dynamic languages that never ever send another DNS request until you fully refresh the page. So after 120 seconds, the TCP traffic will start being directed through the tunnel unexpectedly, the firewall will say “I don’t know what this mid-session TCP traffic is” and reject it, and the browser will just hang until you refresh the page. But by then the user has already lost their work.

This was also a problem with some CDN-hosted sites (Amazon, Akamai, etc) where an IP of a domain changes constantly. We found GP couldn’t keep up with the constant changes (GP waiting for a DNS reply comes at a slight latency cost), and user experience was incredibly inconsistent. Sometimes this would happen during Zoom or Teams meetings.

At this point we’re only doing route-based exclusions for MS O365 and it works line a champ. We have one or two domain exclusions that we verified have relatively static IPs and regular webpage behavior and they do work ok.

When the client connects to GlobalProtect, it downloads the config. In this config, it has a list of the domains you want split-tunneled. That list is downloaded from the gateway configuration and brought onto the local machine. When domain-based split-tunneling is enabled, any DNS query that matches the split-tunnel is then re-directed to the local adapter via next-hop L3 gateway from the GP client. Anything that does not match the split-tunnel, proceeds as normal, through the tunnel.

It’s extremely important to know that the domain-based split-tunneling only affects HTTP/S traffic. Any other traffic will seemingly ignore the domain based split-tunnel. Ex: doing a traceroute will show the traffic from that domain going over the tunnel; however HTTP/S traffic will not.

I’ve wondered this as well.

Consider how a browser access a URL. For example http://www.acme.com/coyotesolutions

  1. The browser takes the hostname portion of the URL (www.acme.com) and does a DNS lookup. DNS returns an IP address (let’s ignore CNAMES).

  2. The browser opens a tcp connection to port 80 (assume http for now) to that IP address.

  3. The browser sends the http request specifying the full URL (http://www.acme.com/coyotesolutions)

Here’s the problem: Only the browser knows the actual URL up until the connection is established to the web server. In theory, the GP client can’t get at that information until AFTER the connection has been established and the tunnel/direct decision has already been made. The matter is complicated even further if https is used. In that case, the http request containing the URL sent from the browser to the server is encrypted. The GP client couldn’t get at it even if it tried.

My assumption is that there is something in the windows API and coded into browsers that handles this. Perhaps browsers use a special DNS lookup API where instead of just passing the host name they pass the whole URL. Interested applications (like security products or GP client) can hook this API call and manipulate or use the data accordingly.

Or perhaps browsers don’t have their own http/https routines but instead use the OP provided API for doing such. Those OS API routines would allow the GP client to hook in and get the information.

I’ve configured it today for a client, works for RDP traffic as well. PAN-OS 10.0.5.

Excellent, thanks a lot. That’s the sort of caveat I was looking for. How did you get that info about these inner workings? Also do you know if there is a command to see the content of the cache you mention (or even just the name of the cache so I can look it up)?

I used to think it was HTTP/S only as well but SSH also works…

When domain-based split-tunneling is enabled, any DNS query that matches the split-tunnel is then re-directed to the local adapter via next-hop L3 gateway from the GP client. Anything that does not match the split-tunnel, proceeds as normal, through the tunnel.

What is this redirection based on? The destination of IP packet, or the content of the DNS request itself? And why do we want this redirection to happen, in the context of domain-based split-tunneling?

It’s extremely important to know that the domain-based split-tunneling only affects HTTP/S traffic. Any other traffic will seemingly ignore the domain based split-tunnel. Ex: doing a traceroute will show the traffic from that domain going over the tunnel; however HTTP/S traffic will not.

I would be curious to know the reason behind this limitation, but beyond that my question is: how does the device determines that a given TCP SYN (which contains no information about domain) matches the domain-based split-tunneling and must therefore be redirected?

Hi, side question. It seems the split tunneling is used for “local breakout” of client traffic to traverse the user ISP vs. being sent to Prisma Cloud, which would then treat and route the traffic as "internet or SaaS’ traffic?

So, if I have users in Mexico trying to access a Government website, I would split tunnel that traffic so it leaves their home on their ISP vs. coming from Prisma. Is that accurate?

Yeah that’s the sort of thing I imagined too. Now what bothers me is that it is a pretty complex process and seems like a hell to troubleshoot if we don’t even know the general algorithm. I have some use cases where this feature would be useful, but for now I am reluctant to implement it because of that (moreover it requires a subscription so it’s even harder to justify if we can’t even know for sure how it works).

DNS has nothing to do with http, it’s at a completely different level and outside any protocol built on top of IP. The network driver is only told to establish a connection to an IP, and GP will use it’s “sniffing” of the DNS lookups to connect that IP to the tunnel/no-tunnel config and act accordingly.

I’ve seen discussions about the driver “binding” routes to specific IP’s to the GP network interface, which would cause the OS to use the GP tunnel for all traffic to those IPs.

Take a scenario for www.acme.com - if DNS returns 4 IP’s (hosted on a good CDN), then GP has to remember that requests to those 4 IP’s should be treated based on the GP config for *.acme.com

I don’t know if the GP driver intercepts every connection to make the decision to tunnel or not, or if it uses a “bind” approach within the OS to wire requests to GP (or to the default NIC).

The path you are accessing, even at an HTTP protocol level has nothing to do with the connections. Connections are entirely based on the IP addresses used to establish the TCP/IP connection.

Browsers like Chrome do have an internal cache of “domain → IP” which reduces excess lookups, and it can be viewed under chrome://net-internals/#dns
This only is used to establish the connection, which GP should be routing to your local NIC or to it’s own virtual NIC. HTTP traffic including methods, headers, paths, are completely independent of establishing a connection, and are layered within the connection.

My dilemma is our GP config is set up to route certain domains to the GP tunnel, for the purpose of supporting reliable IP addresses to third party systems or firewall-protected sites. If your GP infra uses fixed IP’s (ours does), then what you want is for all traffic to *.acme.com to be tunneled 100% so the target servers can “allow” your traffic past any protections. From our experience, this is very inconsistent, and our trusted traffic winds up coming from our engineer’s ISP’s, and is not trusted.

The mac / GP relationship has been problematic in the past, often requiring creation of a “network connection” to get it to work at all, but with bad side effects like Chrome getting “network interrupted” errors frequently. If we denied creation of a “network connection” by GP, most things would work fine for Chrome, but split-tunneling would be 100% broken, sending all traffic to the ISP.

Recently, however the GP drivers for macos themselves seem to be stable, but the split-tunneling behavior is still extremely brittle.

Most of my knowledge of how this works comes from just watching GlobalProtect’s logs and experimenting. Nothing TAC in its current state will teach you. :slight_smile: I’m not sure exactly how GP is doing this binding, but I’ve done some searching this morning and guess that it’s directly in C++ using some TCP socket bind() function. I doubt there’s a way to directly inspect GP’s memory to see active binds.

However, there are a few steps I use to verify that split tunnel by domain is working:

  1. Set the GP logging level to dump (GP client > Settings > Troubleshooting > at the bottom you change the ‘Debug’ dropdown to ‘Dump’)
  2. Open a browser with developer tools open, and open your domain-excluded webpage. In developer tools, find the IP of the page root ‘/’. For this case I’m using zoom.us, and the IP is 52.202.62.233.
  3. Open C:\Program Files\Palo Alto Networks\GlobalProtect\PanGPS.log in a good text editor (Sublime Text or np++). The dump logs fill this up fast, so you may need to also open PanGPS.log.old.
  4. In the text editor, search for the domain, you should see various associated log lines:
    • “Received DNS request for zoom.us
    • “Domain name zoom.us matches exclude single domain”
    • “SP added an exclude ip 52.202.62.233 … ttl 18 for domain zoom.us
    • “ST,remote ip address is 52.202.62.233, port=0, bind local address is 192.168.0.20” (this is my physical adapter’s IP)
  5. I mentioned there’s no way to directly view the binding cache, but you can flip over to Powershell and run the command “Get-NetTCPConnection” -LocalAddress 192.168.0.20", and you should see an entry referencing the zoom.us IP with a local address entry of your physical NIC:
    • LocalAddress LocalPort RemoteAddress RemotePort
    • 192.168.0.20 1850 52.202.62.233 443
  6. Now, if you don’t refresh the page, eventually the TTL will expire, and you’ll see log lines like this:
    • “ST,ExpireTTLTask called, remove this now because the counter is 0”
    • “ST,ExpireTTLTask called, cookie ip address is 52.202.62.233, port 0”
  7. Turn GP logging level back to ‘Debug’ to stop the flood!

I hope that’s useful. I do wish there was a nice way to view the active DNS exclusion binds! If someone out there knows, please stumble on this thread…

What is this redirection based on? The destination of IP packet, or the content of the DNS request itself? And why do we want this redirection to happen, in the context of domain-based split-tunneling?

When a client attempts to hit a domain, a DNS query must be made. When this DNS query matches the list, the redirection is invoked, utilizing the return IP from the DNS query, is my understanding. I’m not sure I understand the last sentence there? The whole point of split-tunneling is to redirect traffic around the tunnel, not inside it. Domain-based is key when dealing with Microsoft (re: Office365) and their IP blocks are massive and dynamic. Split-tunneling based on IP is just not feasible (or at least be prepared to split-tunnel a LOT more than you want to).

I can’t speak to the limitation. I only know this through my own experience and troubleshooting with TAC. I never really dug into or cared about the “why”. I was just happy to finally have it working as it was a long and bumpy road to get this working back then (though part of that was most likely due to improper testing since I didn’t know of the limitation.)

Very useful indeed, thanks again! I was wondering if the client debug log would mention such operations, turns out it does, that’s great to know.

This client log is really a gold mine in terms of troubleshooting, I end up using it more and more. I wish there was a way to have it centralised. Bothering the users to fetch and send their log archive is always cumbersone.

https://live.paloaltonetworks.com/t5/globalprotect-articles/troubleshoot-split-tunnel-domain-amp-applications-and-exclude/ta-p/321075

When a client attempts to hit a domain, a DNS query must be made. When this DNS query matches the list, the redirection is invoked, utilizing the return IP from the DNS query, is my understanding.

Are you talking about redirection of the DNS query, or redirection of something else? In your first answer it seemed like you mean the former (and in that case I don’t understand why we would need that), but now with your new answer I am not sure.

No problem, happy hunting!

I agree, it would be amazing to have a button within the firewall to collect logs from a connected client.

I’m not a Palo engineer, so I could be mistaken, but I’m talking the payload/data traffic. The DNS query should not be redirected…I don’t think. Under the notion that only HTTP/S traffic is, the DNS traffic should traverse the tunnel. It just uses the data from that query to match the domains, resolve the IP and forces the TCP session is to be established via the local adapter, not the tunnel.

It’s a lot simpler than that. The feature works with TCP/UDP sockets, that’s why ICMP doesn’t adhere to it.