SIMPLE vs. XMPP Showdown There is no presence in TelePresence

Tsahi Levent-Levi

Do we need a Swiss army knife as a communication protocol?

July 21st, 2008

You can do everything with SIP: Voice over IP, video telephony, presence, instant messaging, SMS, MMS and much more. Sometimes it feels like SIP is a protocol invented by a salesman: “Oh, you are looking for a solution that starts the microwave when you get to your driveway after a long day? Sure we have it - SIP!”Last week I wrote about XMPP versus SIMPLE, where both are used for presence and SIMPLE and utilize SIP for its transport. A colleague of mine who read that post told me that XMPP can also be used for voice calls. This brought me to the question:

Do we need Swiss army knives or do we need a penknife?

Do we want a protocol that does it all - a Swiss army knife, or should we have specialized protocols for each task - penknives?

Swiss army knife for protocols

Penknives please

Protocols are low level components in an application and are usually not in the core business of the application developers. As such, they tend to be outsourced to 3rd parties - RADVISION, for example, licenses such solutions related to VoIP, video and IMS to companies who wish to develop their own applications and products.

The companies that specialize in protocol stacks and communication frameworks development need to cater to a large customer base, which requires their products to be generic - it needs to fit client products, server products, pure software solutions and embedded devices. This means that as a company, you design for flexibility and optimize for both speed and memory space requirements.

If the protocol implemented is capable of doing a myriad of things, this automatically reflects in the size and complexity of the solution. Look at SIP, for example - it does it all. In the world of IMS, it is even a part of the IPTV solution.

Now, assuming you only want to develop an application doing voice calls without any bells and whistles of presence, video, SMS and the rest of the stuff out there; you are going to use a fraction of the protocol stack you employ. What a waste.

Give me my Swiss army knife

Let’s use SIP for VoIP and XMPP for presence, and take it to the unified communications realm.

I’ll start by introducing a definition commonly known in both protocols: federation.

Taken from an IETF draft proposal:

A Federation is a group of VoIP service providers which:

  • agree to accept calls from each other via SIP agree on a set of administrative rules for these calls (settlement, abuse-handling, etc…)
  • agree on rules for the technical details of the interconnection

In laymen’s terms, if you have two enterprises, each with its own servers that take care of presence and VoIP calling, then each enterprise has its federation(s) and can decide to interconnect them with other enterprises.

In the IETF, the work around federation that started from VoIP peering and the relevant WG (Work Group) was called voipeer. Along the road, the WG name changed to SPEERMINT (Session PEERing for Multimedia INTerconnect) to deal with SIP services and not only VoIP. The reason was the same - the need for one federation that can deal with all services and not a federation mechanism per service.

One of the current advantages of XMPP over SIMPLE is its support for federations - support which is definitely going to find its way to SIMPLE soon enough.

Now, if we use XMPP and SIP instead of SIMPLE over SIP, we now need to manage two separate federation terms. What a waste.

If we’re doing communications and using fewer protocols that means less hassle and management effort to synchronize between them all.

If you’re developing VoIP clients, you are also doing presence or have it on your roadmap anyways, so what’s the point in requiring multiple stacks for multiple protocols? Better to have a Swiss army knife that does it all.

What about you?

I am torn between these two extremes when it comes to protocols. Do we need a Swiss army knife or do we need a penknife? I’d really like to know your views. Leave a comment or let me know.

Tags: , , , , , , , , ,

What's next?

Subscribe to this blog

Subscribe to all of our blogs

Leave a comment


Related posts:



2 Comments
Add your own   

  • 1. Eelco  |  July 21st, 2008 at 1:45 pm

    Currently a presence enabled SIP VoIP client uses these protocols: SDP over SIP for setting up multimedia sessions; SIMPLE over SIP for IM and presence; and XCAP or WebDAV over HTTP for network based resource lists. It could be made simpler by using just SDP over SIP for setting up the MM sessions and XMPP for the rest ;-)

  • 2. Paul E. Jones  |  July 22nd, 2008 at 1:58 am

    There is certainly a lot of appeal to using common building blocks when building different applications. One gets the advantage of re-using code that will prove to be constantly improving due to the fact that it is so well tested. But, we get this benefit with a commercially available protocol stack, whether the protocol is specialized or general.

    But, I would argue that SIP is not really a Swiss Army knife. While it was originally designed to set up a simple voice call — and still struggles somewhat at that — it’s purpose is to set up “sessions”, whatever they might be. Oh, and to do presence. (Presence is really the “odd” one here, since presence is not a “session”, but an event or an advertised user state. So, let’s ignore that for the moment.)

    Now, the fact that SIP can be used to set up all kinds of sessions might be the appeal. But, what would those sessions be? Voice? Voice and video? A separate whiteboard session? A session for file transfer? Yeah, all of these are candidates. But, there’s a problem: how does the file transfer application know the address of the other file transfer application? Do we put all of that complexity into a single, overly complex device? Before you know it, you have an extremely complex and largely unmanageable system. Further, do you want all of that on your phone? Sometimes you might, but perhaps you might prefer to receive files on your PC while talking on the phone. Perhaps they might prefer to use their PC for IM, too.

    I think a protocol should have a purpose, but I do not mind if it is broad. That said, there should be logic and consistency in the way it works and we should not expand the functionality by making the end user’s device overly complex. Who can manage that code? And, as you said, what if you want something fairly simple?

    These are some of the reasons why the ITU started investigating a new multimedia system called “Advanced Multimedia System” to be published as H.325. The idea with H.325 is to decouple the applications (like app sharing, whiteboard, file transfer, voice, video, etc.) from the user’s control device (e.g., mobile phone, mobile communication device, residential gateway, or other). So, users can have any number of applications “registered” with their “container” (as this control device is called). They might use app sharing on a PC, video on an LCD panel on a wall, and voice on their handset (or headset). New applications could be made available to the user without the need to change software in the “container”. So, the complexity of the container is fixed and can be made rigid. Applications coordinate with the “container” in order to present to the remote user the “feel” that all of the functionality is in a single device, when in fact it is not. Each individual application may be maintained and updated by different development teams without the need to coordinate code changes with the other application teams or the “container” development team.

    So, is that a Swiss Army knife? Perhaps: the applications and application types one might development may be numerous. But, there will be a lot more rigidity in the way applications register, coordinate with remote applications, etc. A protocol for enabling real-time communication between two or more users should have some flexibility, but it should be in the kinds and types of applications that can be consistently added to the system, not in the fact that one can re-use a protocol for a purpose for which it was not designed and/or in such a way that it is isolated from other systems. A whiteboard system that can only call a similar whiteboard system and not integrated with the voice and video conversation is not so useful. H.325 would allow that, of course, but it will encourage better integration and coordination. Most importantly, one can use a multiplicity of devices that run different applications — so the whole communication experience is much richer.

Leave a Comment

Required

Required, hidden

:) :-S (H) :cry: 8-| :@ (!) :-D (?) :$ 8-) :-( :-) ;-)

Notify me of followup comments via e-mail

Trackback this post  |  Subscribe to the comments via RSS Feed