Hoping someone can help with this bizarre network problem

UPDATE:

Never did solve this, but I want to thanks everyone for their responses and suggestions. We aren’t 100% sure WHY, but it seems that the Dreamwall’s WAN port was blocking any incoming (or maybe outgoing) DNS requests. All attempts to allow this using firewall rules were unsuccessful. We determined that 2 days with the site nearly down and our folks barely able to work was plenty so we reverted back to original config and removed the Dreamwall from the equation.

Wishing the r/Ubiquity sub wasn’t locked at the moment, really could use their input. :frowning:

Got a really weird issue and hoping someone might be able to help. Apologies, this is a little long and involved and I appreciate you for reading! :slight_smile:

Overview: We use AT&T as our WAN provider. 30 sites all over the country using local cable or fiber internet. Connected to that is an AT&T VPN Gateway device (ANIRA U115). Connected to that are our local network devices. The ANIRA device actually serves as a DHCP server for the LAN and routes traffic bound for other sites to the WAN and internet traffic to the internet.

Yesterday, we tried to implement a Ubiquity Dreamwall appliance in one of these locations. We had AT&T make changes to the ANIRA device, specifically changing the DHCP range it provided. Our Dreamwall appliance would handle all local network services including DHCP for the local LAN while the ANIRA still handled the WAN routing to the rest of our locations. The result was removing DHCP server from the AT&T device, changing it’s LAN IP and adding routes matching the new IP addresses.

So, connection looks like this:

Internet IN (Comcast Cable internet, provides DHCP address)|ATT VPN Gateway, WAN port of this device connected to Comcast device, accepts DHCP address to get online. LAN side of this device is configured with a LAN IP of 10.0.40.254|Ubiquity Dreamwall’s WAN port is configured with a Static IP of 10.0.40.10 and connects to the LAN port of the ATT VPN Gateway. LAN IP of thius device is 10.10.40.1, is also DHCP server for local LAN subnet 10.10.40.0/24 providing addresses .10 - .50|Local devices 10.10.40.X are PC’s Printers and such.

This is where it gets weird:

- ALL devices on the 10.10.40.0 network can access the Intenet with NO ISSUES. Email/Outlook, Teams, OneDrive, Web Browsing all work with no problems. This shows to me that trafic is pasing through the Dreamwall and the AT&T Gateway in both directions.- All devices on the 10.10.40.0 network can ping the 10.0.40.254 IP address of the AT&T VPN Gateway- All devices on the 10.10.40.0 network can ping devices on the WAN by IP ONLY. (We have a DNS server in our datacenter 10.10.250.10 that is configured as the DNS server for devices on the 10.10.40.0 network). PC at 10.10.40.20 can ping 10.10.250.10 with no issues, but cannot ping by server name, cannot resolve DNS.- Devices on 10.10.40.0 cannot reach any domain resources. I am assuming this is becasue internal DNS is not working/communicating the 10.10.40.0 network.- Devices on the 10.10.250.0 network can ping the 10.0.40.254 IP of the AT&T VPN Gateway and even reach it’s Web interface.- Devices on the 10.10.250.0 network cannot ping 10.0.47.10 or ANYTHING inside of the 10.10.40.0 network.- And the most bizzare of all: From the web interface on the AT&T VPN Gateway, there are connectivity tools: from there I can ping and tracert from 10.0.40.254 to 10.10.250.0 with no problem but CANNOT ping 10.0.40.10 which it’s directly connected to.

Have been pulling my hair out for the last 12 hours on this. AT&T techs have created the correct routes in their network and I have attempted to create the proper routes and firewall rules on my Ubiquity Dreamwall. I have made sure that both the AT&T and Dreamwall have the same subnet mask. The fact that these two devices, connected by a 3 ft Ethernet cable cannot communicate fully is bizzare. Can ping UPSTREAM but not DOWNSTREAM.

If anyone has any ideas or suggestions, Please share. I am out of ideas. :frowning: Thanks for reading.

Much of reddit is currently restricted or otherwise unavailable as part of a large-scale protest to changes being made by reddit regarding API access. /r/sysadmin has made the decision to not close the sub in order to continue to service our members, but you should be aware of what’s going on as these changes will have an impact on how you use reddit in the near future. More information can be found here. If you’re interested in alternative r/sysadmin communities during the protests, you can join our Discord or IRC (#reddit-sysadmin on libera.chat).

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

What DNS servers are you issuing via DHCP?

Like said above, check firewall policy regarding destination port 53 traffic to your DCs and 1.1.1.1.

You probably need an explicit allow statement:

source ip range - your Lan subnets
Source port - any

Destination IP - DC IPs and 1.1.1.1
Destination port - 53

You might take a look at your routing tables. See if somethings being jacked up on the endpoints or on the network kit.

Edit: Also check any VLAN configs against your ports.

PDNTSPA…

I don’t think it’s DNS…I think you’ve got confused packets…IMHO based on very little evidence and a big bag of overconfidence.

May want to look at AD Sites & Services if you have two different subnets. By default your workstation will attempt to find the DC in your current subnet, unless you have specified that DC in another site and set the parameters.

Also - are you able to pass DNS traffic across your VPN?

If the workstation is getting DHCP and it’s handing out two DNS addresses, primary is internal DNS and secondary is your external (1.1.1.1) - the request could be failing on the primary and rolling to secondary, which will allow you to resolve an external WAN address.

I have both internal and external being issued via DHCP:
10.10.250.10, 10.10.250.11 (Both internal DNS who use forwarders to recolve external addresses) as well as 1.1.1.1
Clients inside the network can resolve external addresses jsut fine, just not internal addresses. It’s almost as if the WAN port of the Dreamwall is a one way valve for any traffic that isn’t Internet, blocking ingress of internal traffic,

I did. Odd thing is DNS works for internet traffic. I can resolve google.com easily, it’s just internal DNS traffic that isn’t passing.

The firewall setting on this device are, well, basic at best and I am having difficulty determining if I need LANOut or InternetOut rules (or IN for that matter). I am continuing to experiment with this. Thanks for the suggestions!

I had the AT&T guys look at this all day yesterday and, from what several eyes can see, they all look correct. We are all continuing to troubleshoot though.

Which order are the DNS server IP addresses being issued? Normally you have DNS servers can resolve your AD domain first in the list and then the others… Another thing to check out would be were port 53 traffic is being allowed to…

Are the DC servers on the same site as your 40 subnet? Or at a different site?

In theory if you run a trace route you should see where things bomb.

I’d get some Wiresharks/snoops in to see if you can get your fingers on the packets.

If I enter the switch looking for X but never exit then something is jacked with that switch config. You might just refresh the config to see if it just picked up some cruft. Also patch all your firmware.

I know the answer is always DNS but I’m not feeling it. Again, armchair network guy that doesn’t do this for a living any more. I’m probably wrong. It sounds like you got smart folks to help already.

Internal DNS listed first in order. HMM, will check port 53. Thank you.

DC/DNS server is on another subnet in a different site (or datacenter, actually) on the WAN.

Can you not ping your DCs at all, by host name or IP from the affected network?

And FW policy is on the dreamwall, not the ATT box, right?

I can ping the DC 10.10.250.10 from 10.10.40.20 by IP, just not by server name. That server cannot ping the client on10.10.40.20.

What DNS suffix is your local DHCP server giving out? Is it the correct one for your Windows domain or is it something else?

If your domain is corp.example.com, and your local DHCP server is giving out local, then attempting to resolve “dc01” will cause your local devices to look for “dc01.local” and not “dc01.corp.example.com”.

That looks and smells like a DNS server connection problem. When you do an nslookup from one of the machines in the 10.10
40.0/24 network to the 10.10.250.10 DNS Server does it actually connect? So try something like nslookup www.cnn.com 10.10.250.10 to force the DNS client to use that server as its DNS server and see what you get…

giving out domain.local. tried pinging servername and servername.domain.local as well. no joy. :frowning: