The answer to this question is obvious: VoIP is an element of Communication, and firewalls and network address translators (NATs) are elements of Separation (NAT tends towards Obfuscation, but it amounts to the same thing). These are opposing forces: Separation constricts Communication and Communication pierces Separation. It’s like yin and yang, day and night, law and chaos. Can a leopard change his spots? Can a firewall be welcoming?
Oh, you mean at the technical level? Right.
None shall pass
There’s an age-old yarn from the early days of the Internet:
Tech Support: “May I have your phone number, sir?”
Customer: “I don’t give out my phone number!”
Tech Support: “All right. How may I help you, sir?”
Customer: “How much for your Internet service?”
Tech Support gives him the prices.
Customer: “If I own the software why do you keep charging for it?”
Tech Support: “Well, sir, the software is free, but you are charged for being online.”
Customer: “YOU CONNECT YOUR COMPUTER TO THE PHONE LINE?!?”
Tech Support: “Well, sir, you do use a modem to dial online.”
Customer: “I WILL NEVER HOOK MY COMPUTER TO MY PHONE!!!!” (click)
This actually makes sense: no connection, no problem. The phone is for phone calls, the computer is for software, like this “Internet” thing. Firewalls exist to block anything that comes into a private network in case it’s unfamiliar. That includes VoIP, as the Internet wasn’t originally meant for VoIP (in contrary to the plain old phone system). Configuring firewalls and NATs to allow VoIP communication is a problem both for the home user (who lacks the technical expertise) and for the system administrator (who does not have the time to maintain many devices of various vendors).
The dynamic nature of VoIP protocols
VoIP protocols (but also other protocols, such as FTP) use one level of the protocol to enable another. VoIP is usually split into at least two layers - signaling and media. Signaling does capability negotiations and opens the media channels dynamically on ports that are not known in advance, so even when we get around the initial connection problem, there are still more connections to be opened. Sometimes protocols try to use the connection they have for other uses (interleaving in RTSP). At other times they have to force one side to open all the connections, assuming that one side is on the public network. Of course, if firewalls and NATs could just follow the signaling, know how to parse them and then open the way for the dynamic ports that send and receive the media in real-time; they could replace addresses and open pinholes themselves. This, however, is not an easy task. It would complicate them and make them both more expensive and vulnerable. Another problem with that lies in the versatility of VoIP protocols: RTSP, SIP, H.323, etc., and different versions for each, not to mention security algorithms added which will force the firewall to participate in them as an intermediate node.
Protocols usually find another way across firewalls, which at time requires a third party - in the public network - to translate addresses, synchronize, and even act as a relay between the endpoints (e.g. Skype).
Conceptual problem
Maybe the biggest problem is the conceptual one, the one I mentioned in the beginning of this post: firewalls and NATs are considered elements of separation, not of communication. Some work has been done on the option of application inside a firewall communication with the firewall itself and asking it to direct incoming connections from the network to the application’s listening port, and to provide public addresses. Most notably, the Internet Gateway Device Protocol (based on UPnP) is supported by some routers and firewalls and is used by several VoIP solutions (MSN messenger, for instance). I also found a paper about using RSVP (RFC 2205) for the purpose (PDF), but Diameter can be used just as well (UPnP does not implement authentication). Firewall support is the main problem with such a solution, however, usage of VoIP is growing and firewall vendors will have to acknowledge it and provide means for its use. Another option is to use some other network device to receive (authenticated) requests from network components and configure the firewall accordingly, such as the work done in the Middlebox Communication (midcom) workgroup. I find it strange that VoIP standardization bodies have not adopted such a solution as the preferred mode of operation, as it could potentially solve the problem completely. Not just for VoIP, by the way, but for unified services and whatever the future has in store for us.
* Image courtesy of DimsumDarren and nienaber.fred.

Comment or trackback
1. 8 Applications I'd Like t | October 21st, 2008 at 2:33 pm
[...] don’t really care about details (FW traversal is a necessary). It just has to [...]
Trackback this post | Subscribe to the comments via RSS Feed