SIP PBX Implementation, Part II: How Do PBX Vendors Handle the Media Relay Dilemma? What Layouts Should a Dual Video System Support?

 
Tsahi Levent-Levi

Server Side Interworking Sucks

December 7th, 2009

If I had only known how much noise my Google Jingle/SIP post would make, I would have written it sooner. But then again, you can never really know which posts will get people talking.

I had multiple discussions about this subject the day the post was published – on my blog as well as on Twitter. It seems to me that there is something that I missed and I feel it’s necessary to say here: Server-side interworking SUCKS.

Interworking?

Interworking means taking two protocols, that solve a similar problem, and getting them to work properly with one another – think of it as trying to connect two pieces of a puzzle that don’t belong to each other – best way to make it happen is to smack them real hard until they squeeze into place.

A gateway is usually the means to provide interworking. You place an external server somewhere in the network that acts as an intermediary. It speaks both “languages” and translates from one to the other.

Interworking Drawbacks

Have you ever used Google Translate? I use it frequently to follow a Taiwanese blog written by a good friend of mine. The problem is that the translation from Chinese to English is so lame, I need to get the gist of the post from reading one in every four words.

Sometimes, an interworking function is much like that. As protocols are different, they tend to work differently and the “translation” from one to another isn’t smooth.

The things that an interworking function can suffer from include:

  • Latency – Gateway addition requires an additional “hop”, as we now have a “middleman”.
  • Security – did I say “middleman”? You can’t really do encryption end-to-end with someone sitting in the middle.
  • More latency – there are times when the translation itself is expensive, or requires time. H.323 to SIP translation requires sometimes to wait for H.323 capabilities and channel opening – something that is done very differently in SIP (both in terms of concept and in terms of timing in call setup).
  • Resource consumption – you might need to transcode audio and video, while you’re at it. This is going to be very expensive in terms of the CPU. It will also make it harder to scale the gateway.
  • Interoperability – instead of focusing on a single protocol for interoperability, you now need to deal with pairs. It makes things way more complex.
  • Generalization – if you interwork on the server, you need to take care of more use-cases and generalize things. You won’t be able to make shortcuts, as you won’t really know what gets implemented on the terminals on both sides and what doesn’t.
  • Dial Plans – how the hell do you deal with dial plans? How are you going to provide numbering behind a vastly different protocol? How do you make the user’s interaction smooth enough when dialing or receiving calls?

The Google Hell

For Google, or any other company in the market, the need to support multiple signaling protocols is interworking hell: half of the network uses one protocol, the other half uses another.

Now, let’s say a GTalk user (=Jingle) wants to dial a Gizmo5 user (=SIP). How will he indicate the address to dial? And once he does, how will the network allocate the interworking/gateway resource for the call itself?   If you had a single protocol, this would be a non-issue, as both would be sitting on the same network.

How about new features? Let’s say there’s this new gizmo thingy, which makes a great new feature, so Google wants to implement it. Google will have to do it 3 times: with Jingle, with SIP, and with the  interworking function. That’s 3 for the price of 1 – not a very good deal.

Options

So what are the possible solutions that Google and others facing multiple protocols can use to handle interworking?

  1. Build interworking functions – I think I’ve made my point against them here clearly, but at times they are a necessary evil (connecting 3G video calling to video conferencing room systems, for example).
  2. Migrate to a single protocol – this will make the problem simply go away, but it isn’t possible in a lot of cases. For example, I don’t think Skype migrating to SIP is possible to achieve in less than, well, 3-4 years.
  3. Multi-protocol clients – have multi-protocol support not inside the network but on the edges – make your clients talk in the different “languages” you support on your network. For example, video endpoints that talk both H.323 and SIP.

It’s a good thing we’ve dedicated our previous newsletter issue to exactly this concept.

Another Opinion

There are those who think otherwise. Thiago Rocha Camargo provides his take on Gizmo5/Google:

The point here is that I personally don’t think Goggle is willing to implement SIP on  their widgets of web chat voice/video solutions like we have already on GMail. Making Jingle/SIP a necessity anyhow if they also want to allow their users place calls to PSTN also from a Website Flash Widget.

IMHO Google will probably go for a Jingle/SIP gateway

His views might well be what gets implemented in the short run, but my belief is that in the long run, companies would like clients to be able to deal with multiple protocols instead of having the core of their infrastructure deal with interworking hell.

3

Comments and trackbacks

  • 1. Paul E. Jones  |  December 7th, 2009 at 5:35 pm

    Tsahi,

    Google needs a way to interwork with the SIP-enabled gateways and SBCs, but SIP really isn’t in line with its web strategy. XMPP, on the other hand, is since it utilizes XML and has interfaces like BOSH that allow one to get presence information and even do IM via the web browser.

    You’re well-aware of my opinion of SIP: it is a nice client-server protocol for voice and video, but it falls significantly short trying to do anything more than that. SIP was initially intended to be a light-weight protocol that breaks away from the traditional telephony model, but has in fact fallen into the trap of replicating the PSTN over IP, implementing much of what was in the PSTN world and behaving like a traditional telephony protocol. It is not the web-centric, simple, light-weight protocol it was supposed to be: it is quite the opposite.

    XMPP is web-centric in many ways and is very flexible. Heck, they are building things like Google Wave on top of that infrastructure! So, it makes a lot of sense trying to use XMPP in the core and pushing SIP to the edges for interworking with the rest of the world.

    That said, perhaps Google has come to the realization that few have implemented Jingle and their plans are to marry SIP + XMPP into a single client. I can imagine IM and presence functions being handled by the XMPP side and SIP used for voice/video.

    That makes sense for GoogleTalk, but what about the Google’s Chrome OS? What kind of voice/video support will be available there? The only way to do that is via some plug-in to get voice/video capabilities from JavaScript, but perhaps that’s exactly what they’ll do.

    Long-term, I agree they would be best-served by having a single protocol, but SIP cannot be it: it lacks the web-centric capabilities that Google needs to enable richer forms of communication available via XMPP.

    Paul

  • 2. Johann Prieur  |  December 7th, 2009 at 8:19 pm

    I believe that the semantics used in SIP and Jingle are very similar and supposed to play far better together than SIP and H.323. Also you might not need to touch the RTP stream at all which means that the gateway concerns only signalling, which discards most of the drawbacks you enumerate.

    From http://xmpp.org/tech/jingle.shtml, “in the case of voice and video chat, a Jingle negotiation usually results in use of the Real-time Transport Protocol (RTP) as the media transport and thus is compatible with existing multimedia technologies such as the Session Initiation Protocol (SIP). Furthermore, the semantics of Jingle signalling was designed to be consistent with both SIP and the Session Description Protocol (SDP), thus making it straightforward to provide signalling gateways between XMPP networks and SIP networks”.

    Also, translations between SIP and Jingle are property defined (http://xmpp.org/internet-drafts/draft-saintandre-sip-xmpp-media-01.html).

    Of course, when talking about interworking, all is not bright and easy but on the other hand I think implementing it on the server side makes more sense than ending up with multi-stack clients, and XMPP is not the worst technology to federate here.

  • 3. Tsahi Levent-Levi  |  December 8th, 2009 at 10:41 pm

    @Paul, I tend to agree with your analysis, except the Google Chrome OS thingy – as Google controls it and what goes into it, adding SIP and XMPP there is easy for them.

    @Johann, while I can see where you’re going with your comment, and I have looked at that interworking draft, as you already stated, not all is bright and easy – everything is simple on paper but hard once you try to implement it.

    My main point in this post was to note the fact that interworking sucks. It doesn’t matter which protocols you choose – as long as there is more than one – you’re bound to have massive headaches. If for some reason I haven’t made that clear in the post, it is my mistake alone and I hope this statement here will remedy that.

Required

Required, hidden

Notify me of followup comments via e-mail

Trackback this post  |  Subscribe to the comments via RSS Feed