Ok I know this will pull a significant amount of hate from all of the NAT haters. 99.9% of the time I would agree. However our business is unique. That is the first thing I am going to layout for sake of the discussion that will happen.
What we do: We do real time video communication.
Who we do it for: Medical Institutions.
How we deliver it: Via Private MPLS from the client site to our call centers. At the client side we ride their infrastructure.
Hopefully the issue becomes immediately clear. If it does not let me help out. I own my network and the MPLS links and CPE router. I do not own, control, influence or have any visibility into the client infrastructure. In most cases the answer would be who cares push it to the gateway NAT it and be done with it. However real time communications using SIP first don’t natively like NAT (but I have that issue fixed…..I think.) and these systems are not simple point to point communications. Instead they are CientX to server, server to ClientY, ClientY to ClientX communications. The solutions should be pretty obvious;
- Client Based VPN and make them all part of my controlled network (@joshGants favorite). Bet lets take a swipe at this. It means that it is one more thing on a windows based system to break and it does not help when my systems are SIP based Polycom, Tandberg, Lifesize, ect hardware. Also I think allowing a vendor to run fully encrypted communications over your network that you have no way to inspect is just dumb. Imagine what my medical client security departments think.
- Procure a range of public IP addresses signification large enough to support my current and future servers and video endpoints. Then StaticNAT all the client systems into a routed space on my network. This is pretty much what we are doing right now but due to lack of NAT support in the Cisco Unified Presence client (CUPC) we have to know the return routes for all of our clients(which does not scale). But the big issues with this approach is that we have clients who will not route network addresses that are not internal to their network. If they have to route outside it means you have to NAT to an internal address first then have it NAT to a rout-able address on my network. Not really an issue for servers but a constantly growing number of video end points in our call centers is another issue all together. As a side note on this solution I have not tried to get a substantial block of IPv4 addresses recently, but from what I hear there are not to give out (thoughts?).
- NAT, NAT, NAT. So we arrive at the core issue in this whole mess. We have to NAT. We have to NAT incoming and outgoing traffic. There was a point in time that I considered doing this at my Network CORE but after an hour of @ioshints time (I think I still owe him a beer..) it was clear that the only really scalable way to do this was at the CPE side of things. So where we find ourselves building out our test environment for our new Video Call Center platform (more on that in another post) and we setup our lab as shown in this diagram.
The only problem is that as we currently have this configured it is not NAT’ing as we think it should be. So after banging our heads on it for a few days and pushing our project further behind we did what any glutton for punishment does and we called TAC. So what was the sage advice of TAC….um well you can’t do that. My response to TAC was why not? TAC says because the NAT addresses are always changing there is no way to lync the traffic. Well that makes tons of sense if we are talking about classic communications in which Client X talks to the Server and at some point later after the translation has timed out and been cleared from the table Client Y tried to talk to Client X on its now defunct translated address. But that is not what is happening here. The translation from the initial communication to the server will still be active and should allow the the clients to communicate.
So there you have it. My NAT hell. As you see in the diagram we have some loop-backs that were added to try to mimic our currently deployed router configs. We think that this may be dorking things up so as soon as I press publish on this baby I am going to go reconfigure the router with no loop-back. and another router upstream to mimic a clients core network. I will be publishing and update. As well if anyone can come up with something that I am flat out missing that would simplify this I will gladly reward them with at $100 Amazon Gift Card. The reality of our business is that it has proven to be very technically challenging. Pretty much not a week has gone by in the past 18 months that I have not learned something and have had to stretch my mind.
Hopefully this will generate some great commentary around NAT and Service Provider Implementations. Plus I am sure that there will be a few well if we were on IPv6 this would not be a problem comments. I agree but we are not. Have fun.
@cloudtoad has a design in a blog post for virtualized bi-directional NAT with overload based on the Cisco ASR.