If there is one thing I learned, it’s that text based encodings are a menace. They have their use, have no doubt; I’m not throwing away all the XML flavors and HTTP-like protocols. I’m just saying that their use should be limited to cases where we already deal with text-based data, like HTML or information repositories, not for complex communication protocols.
Two things can be said about binary decoding: it’s rigid and it’s repetitive. That means that if you want to interact with another implementation, you have to follow the definitions to the letter, otherwise your message may not be read at all. As for the repetitive part, it makes it efficient for encoders and decoders to use abstract data objects and recursive code. This makes decoding faster, easier to debug, and most importantly, oblivious to the actual message content.
Text based encoding is neither rigid nor repetitive. Text can usually be upper or lower case, use spaces or tabs, new-lines and/or carriage-returns, with or without quote marks and hyphens, etc. Each field tends to have varied content, in the form of string, list, attribute=value pair, range, or any form that seems to fit the current need. That means that the parser needs to understand the fields it is reading, in order to know what it needs to expect. In fact, it needs to expect the unexpected, because you never know if someone might decide to do things a little bit differently. This makes it very tempting to decode to specialized data structures. You have a data type for the message, one for each type of header, another for the action, another for the command.
The decoder/parser reflects on the entire design of the stack. When all you do is parse text, such as in HTTP or XML, this is not a problem. Text will be filtered, tagged and ultimately displayed as text. When the goal is something other than text manipulation, like the SIP or MEGACO protocols, the text needs to be limited to the parser level, but it can never fully hide the text, meaning that the stack must spend time understanding text strings and converting them to meaningful data. Since every implementation is a little bit different, any implementation requiring interoperability with other vendors would have to spend a lot of its time just understanding strings. The result looks like spaghetti with meatballs, the spaghetti being the parsing code and the meatballs are the stack logic scattered here and there. Each section of the “meat” code has to be specialized to handle the specific message text at that specific location, because it uses specialized data structures.
If a strict text encoding becomes standard, strict enough to be easily replaced by a binary representation, the text parsing can become unaware of the context, allowing the stack code to deal only with the stack logic. If such a thing does happen, I actually recommend taking another step and making the encoding truly binary.