Why text-based protocols hurt your design
If there is one thing I learned, it’s that text based encodings are a menace. They have their use, have no doubt; I’m not throwing away all the XML flavors and HTTP-like protocols. I’m just saying that their use should be limited to cases where we already deal with text-based data, like HTML or information repositories, not for complex communication protocols.
Binary Decoders
Two things can be said about binary decoding: it’s rigid and it’s repetitive. That means that if you want to interact with another implementation, you have to follow the definitions to the letter, otherwise your message may not be read at all. As for the repetitive part, it makes it efficient for encoders and decoders to use abstract data objects and recursive code. This makes decoding faster, easier to debug, and most importantly, oblivious to the actual message content.
Text Parsers
Text based encoding is neither rigid nor repetitive. Text can usually be upper or lower case, use spaces or tabs, new-lines and/or carriage-returns, with or without quote marks and hyphens, etc. Each field tends to have varied content, in the form of string, list, attribute=value pair, range, or any form that seems to fit the current need. That means that the parser needs to understand the fields it is reading, in order to know what it needs to expect. In fact, it needs to expect the unexpected, because you never know if someone might decide to do things a little bit differently. This makes it very tempting to decode to specialized data structures. You have a data type for the message, one for each type of header, another for the action, another for the command.
Implications
The decoder/parser reflects on the entire design of the stack. When all you do is parse text, such as in HTTP or XML, this is not a problem. Text will be filtered, tagged and ultimately displayed as text. When the goal is something other than text manipulation, like the SIP or MEGACO protocols, the text needs to be limited to the parser level, but it can never fully hide the text, meaning that the stack must spend time understanding text strings and converting them to meaningful data. Since every implementation is a little bit different, any implementation requiring interoperability with other vendors would have to spend a lot of its time just understanding strings. The result looks like spaghetti with meatballs, the spaghetti being the parsing code and the meatballs are the stack logic scattered here and there. Each section of the “meat” code has to be specialized to handle the specific message text at that specific location, because it uses specialized data structures.
If a strict text encoding becomes standard, strict enough to be easily replaced by a binary representation, the text parsing can become unaware of the context, allowing the stack code to deal only with the stack logic. If such a thing does happen, I actually recommend taking another step and making the encoding truly binary.
Tags: binary, decoding, Design, encoding, Implementation, Interoperability, Megaco, parsing, protocol stacks, Protocols, SIP, text, XML
Related posts:

Add your own
1. Yan Simkin | February 12th, 2008 at 4:56 am
As a one who used to work a lot with different vendors of equipment (SIP and H.323) during the “RVSN” part of my life as an Interop Test Eng, I know exactly what you mean when you’re talking about the difference in the implementations of the protocols based on text. But what would you suggest? Eliminating text-based protocols?
Just for example, in H.323 world, there was a moment when a first standard for Dual Video has arrived (H.329). Despite being a standard, every vendor implemented this in different way and this was quite a headache for a very long period of time. But, at the end, the things got better
2. Yan Simkin | February 12th, 2008 at 4:58 am
Continuation:
… and the things started moving. Maybe, this is what’s going to happen with SIP and the others?
BTW, why are there only TBU guys posting here? I would expect the NBU to contribute a bit as well - they have their wise men
Thanks,
WBR,
Yan
3. admin | February 12th, 2008 at 5:44 am
Hi Yan,
We’ve just begun posting here on these blogs. I am sure that in the near future you’ll see additional blogs pop up here on our site with varied content from different parts of RADVISION.
4. Ran Arad | February 12th, 2008 at 10:35 am
My suggestion is twofold:
One option is to force text encoding to be rigid. For instance, if IMS would have enforced some rigid encoding standard the same way it enforced Sig-Comp, There might have been a movement towards rigid code. Alternatively, a rigid encoding could replace Sig -Comp by some conversion to binary form.
Since we do not live in a perfect world, my other suggestion is that programmers be aware of the effect the parsers tend to have on their design, and take measures to prevent it, like forcing the parser to be unaware of context, or at least resisting the temptation to add logic to the parser.
Leave a Comment
Trackback this post | Subscribe to the comments via RSS Feed