Once in a while I take the time to answer questions on the web – my favorite hang-outs now are Quora and LinkedIn. Yesterday I bumped into a question on LinkedIn that I just had to answer. Sunny Jarial was asking:
“SIP is based on ASCII and H323 is on digital pattern and hence SIP requires more bandwidth than H323. As video is concerned, H323 supports lip synchronizing but not SIP then why we are moving towards SIP? only bcz of simple design and implementation?”
As you can see, I answered. Then I had a short back-and-forth email conversation with Sunny around this question. And finally he sent me this question via e-mail:
Which protocol do you prefer – SIP or H323 – and why?
Hmm. Didn’t see this one coming. That’s like asking a child who he loves more – mommy or daddy. After all, I’ve always had a warm place in my heart for H.323 – I’ve been the project manager of the H.323 protocol stack of RADVISION a few years back – which means I might not be the most objective person on the subject. On the other hand, I’m a very big (and loud) advocate for SIP here in the blog and elsewhere.
So I thought I would try and answer this question, but do it publicly here. I say only “try” because up until today I don’t think I really had any opinion about this debate, and now I am forced to actually take a side.
The Technical Angle
H.323 is based on binary encoding, which makes it a bit more structured and robust than SIP, which is plain text based. Therefore it is a bit harder to program than SIP – just because binary is always a bit harder than plain text for developers.
SIP is easier to develop – at least to the point of having a rudimentary implementation – but then it is harder to tweak for interoperability – too many open-ended issues due to its textual nature.
I also think that SIP is a bit harder to secure against denial of service attacks. Being textual means that nasty attacks that are based on string processing stuff make its implementations more susceptible.
The funny thing is that basically anything you can do with H.323 you can do with SIP as well and vice-versa. Those that say otherwise either discuss a feature that is not part of the basic protocol(s) or discuss a feature that was just not implemented in an interoperable fashion around the industry. While the first issue is irrelevant, the second one is correctable.
IMHO H.323 is the better technical solution, but again – that’s just me. I’d prefer H.323 every day of the week, simply because I know it better.
The Market Angle
On a pragmatic marketing view, SIP wins de-facto. It is widely deployed and has a huge ecosystem – both commercial and open source.
While H.323 is the solution today for enterprise video conferencing that will probably change in the next 5-10 years, as SIP starts to be deployed as part of many Unified Communication solutions and gradually replaces H.323 endpoints and infrastructure in the process. In the interim period, both protocols will need to live side by side – either by having devices speak both protocols, or by having gateways to interwork between the two.
From a marketing perspective, I prefer SIP. It’s a protocol that can be integrated better into more solutions already being deployed.
If you are looking for a video calling solution, you will need both H.323 and SIP.
And don’t forget – we live in a multi-protocol world. There will never be just a single protocol.