As I stated manytimes, I think binary encoding is superior to text based encoding. However, usually, when you are a software engineer implementing a protocol, the protocol choice is not yours. In this post, I will consider the pitfalls and gotcha’s of text-based protocols and how to design your way around them. I mostly consider protocols defined by ABNF rules (such as SIP). If the protocol uses an XML scheme, there are readily available documents on parsing XML into DOM trees, as well as in standard libraries. In fact, the advice in this post was influenced by XML parser principles, regular expression architectures and also by my own experience with text parsing and binary decoding.
1. Message Size
Text-based protocols usually are not kind enough to provide the entire length of the message in advance. The best way to design around this is to write a parser for just one line at a time. This way, it should be able to preserve its state between lines and to point out where the line it processed ended, or the line ending was not found, in which case it should remain in its previous state. The connection layer should then append the next buffer read after the unprocessed section and resume parsing lines.
2. String Tokens
As soon as you write strcmp(), the parser efficiency is gone. Opt for generalized tokenizing: build an enumeration for every string token you expect and create a huge function to turn every known string to the matching enumerated token. The function should work like a state machine, reporting an accepting state (with the token code), a non-accepting state, or “end of line reached”. The following is a state machine for the words “ABC”, “ACB” and “CAB”:
This may appear as many states, but a small script can take the token words and write a function to tokenize them efficiently.
Notice the tokenize-state-machine diagram does not differentiate between upper and lower case and treats any “white-space” as word end. This grants it some degree of resilience to most non-standard implementations that differ in spaces versus tabs and upper versus lower case. Punctuation marks are still a problem, as can be seen below.
3. Message Syntax
Many times, the text protocol will be defined by ABNF rules. These are substitution rules that are usually nested. For instance, one rule defines a time format and another rule uses that definition in another command:
$time = DIGIT DIGIT “:” DIGIT DIGIT
$timeRange = $time “-” $time
If an ABNF rule was made completely out of primitives (such as DIGIT in our example), we could write regular expression syntax to parse it and return field-value pairs. The complete parser would have to expand on the regular expression engine by adding a nesting ability. If the syntax defines a primitive, it would match it; if it calls another rule, it will invoke it recursively until it is parsed into primitives. Primitives, in our case, are tokens (which we already know how to tokenize), numerical values and string values. This nested regular expression engine must be able to preserve its state between calls, as some ABNF rules can span more than one line. The engine can also be taught to treat different punctuation symbols as equivalent, to deal with non-standard messages.
4. Data Structures
If you are familiar with XML or HTML, you may have used the Document Object Model (DOM). The DOM is simply a tree data structure containing XML and HTML by hierarchy. Each element represents an XML tag, an attribute or a value (text). Any text parser can use a similar data structure to hold parsed messages. If fact, a text parser will have to use one tree data structure to hold the message syntax and smaller message trees to hold just the token values in the messages. The syntax tree is also important for encoding text messages: the layers above the parser may have built the message tree in any order and the encoder will have to encode the values in the order defined by the syntax tree.
Is developing protocol stacks and communication software any different from developing any other software? I have to develop API’s for my protocol stacks a challenge in itself and related more to user psychology than to programming. I have to check control flows all the way down to the network and back to the application. I also need to process large amounts of data and develop many platforms. I hope these tools will be as useful to you as they are for me. Most tools are free, some are well-known while others are hard to find. I tried to concentrate on tools that are either essential for protocol programming or are not well-known such as Tools such as Beyond Compare and Visual Assist, while recommended for any programmer, are not detailed here.
This is an essential program for the development of any communications software. Firstly, it is a de-facto sanity check: can WireShark read and understand the message I encoded? Is all the information there? It is the final dispute resolver between the sender side and the receiver side. Anything on my side of WireShark is my responsibility, anything on your side of WireShark is yours, and in the rare case where the problem is between WireSharks - it’s the network’s fault. It is almost bug-free, regularly updated, distributed to all major platforms, free and open-source to boot. Kudos to the developers.
As I mentioned, we develop software to run on many platforms. We used to maintain two GUI applications, one for Windows and one for UNIX platforms. About eight years ago, we discovered Tcl/Tk, which is a scripting language used for GUI creation and looks and works the same for Windows and UNIX. Over this GUI, we implemented a macro recorder and script execution for load tests, another recorder for automated testing (an adaptation to run the Tk GUI on a remote computer for embedded systems), further Tcl scripts to analyze logs and a packager to help the release process. The language is both simple and powerful. While it may not be as powerful and extendable as Perl, Python or Ruby, if you also want a present GUI integrated with your script, it is hard to beat. It is distributed freely by ActiveState, and has widgets and scripts developed by the community from a tree widget to an oscilloscope.
When you develop standard-based protocols, they tend to add up. A quick look in packetizer revealed that MEGACO has 62 extensions. H.323 uses H.225, H.225, Q.931, 9 H.235 documents, 12 H.450 documents, 22 H.460 documents and much more. The number of SIP-related RFCs is hard to keep track of and with IMS in the picture it is surely in the hundreds count. When you receive an innocent customer’s question like “Does your protocol stack support this service?” you have to do some cross-standard searches for mentions of the service and its dependencies. You may also have to mention the CS databases (kept in outlook) for previous questions on the subject. GDS is so far the best solution I found, if only they could fix the Antivirus aggravation problem and the attachment indexer bug. Available free from Google, that is, if the possible cost of your privacy is “free”.
Text editors are a dime a dozen. Even free and sometimes open source text editors are available. So why recommend this one? I have a few simple reasons:
Low resource consumption (useful to have)
Microsoft-like keyboard shortcuts
Macro support
Customized syntax highlighting
Search-again, search-backwards keyboard shortcuts. These should work exactly like Visual Studio’s F3, Ctrl-F3, Shift-F3, Ctrl-Shift-F3. These should, in fact, be F3, Ctrl-F3, Shift-F3, Ctrl-Shift-F3. Since this is a very strict demand, I agree to do some work on my part, like macro recording and keyboard shortcut assignment, so long as I get it in the end
Ability to open and handle huge log files
Spelling checker
Binary file editing
Use of regular expressions in search and replace
So far, TextPad was the only text editor to satisfy all these demands and is available from $33.00 to $9.20.
I don’t know if this is related to users at large, protocol programming, just to programming or just to me, but I usually have about six file Explorer Windows open at any time, which clutters up my task bar. Since I have tabbed browser windows, and thanks to WndTabs (free), inside my Visual Studio environment, I wanted a tabbed interface for my file explorer as well. I used xplorer² (free trial, 30$), CubicExplorer (free), XYPlorer (free trial, 15$), Frigate (free trial, 25$) and Ultra Explorer (free), they were all lacking, but I could not put my finger on where exactly they lacked. They all had many interesting features, from marking recently used files, through to dynamic filtering, to command line interface, but only after using QTTabBar I realized that what I wanted most was simplicity: quiet integration with the regular file explorer, automatic trapping of created explorer windows, that’s it. Grab it while it’s free.
I’m always happy to get recommendations for useful tools, so please tell me what you use.
Jeff Atwood for Coding Horror already mentioned the better looks of Guitar Hero 3, and concluded that presentation matters. Reading it, I concluded that he has not played the game for more than half an hour, since after half an hour you hardly see the presentation, except for when it annoys you. My favorite game at the time was the first Guitar Hero, since it was easier (I actually felt abused by GH3). Then came Rock Band. This time, a drum set and a microphone were added to the mix, and I quickly pre-ordered a copy. The game is fun and not as hard as GH3, plus the graphics are cleaner, but there was a dark cloud in the sky: the Guitar Hero guitars did not work with the Rock Band game, and vice versa (especially GH3, especially PS3).Harmonix, the creators of Rock Band, claimed:
“Harmonix has an open platform philosophy and their games will be compatible with third-party controllers that conform to the various platform controller standards.”
RedOctane, who created the guitar for Guitar Hero 3 retorted:
“There is no such thing as an open standard on PS3 for guitar controllers. That’s just a crock. Open standard is something like USB or 802.11. They publish the spec and if you want to build a USB anything, you follow these specs. I defy anybody to show me, before our games were released, to show a published spec of how to build a guitar controller on PS3.”
Touché. I tried. I found nothing. On top of that, just as Harmonix were about to release a patch to enable interoperability, Activision, the creators of GH3, blocked them. Why? Because.
To the left, RB drums, to the right, GH:WT drums. Notice, please, the subtle difference. No, I am not referring to the stylish black vs. the glowing gray, nor to the spirally cord to the base drum peddle. You are getting warmer with the cymbals - the new drums have five, count ‘em, five drum-pads instead of four, to ensure complete incompatibility, hurray! Finally, into the arms race leaps Kunami, with a six-pad-horror:
What we have here, ladies and gents, is a protocol war. Each company wants to lock people into their own product, gaining money from selling additional songs, and making sure the cost of switching games, that is, buying a complete set of guitars and drums, is too high. Now that Activision and Kunami made their moves, it is left for Harmonix to counter. What would I suggest?
Aggressive marketing: Push your product, drop prices. It’s time to flood the market so that anyone who considers playing drums would have already bought your system by the time GH:WT is released.
Upgrades: Release silent-pads to your drums, the noise is really annoying.
Covert Action: Release an “unofficial” patch that would allow compatibility with GH3 guitars. When the time comes, release another “unofficial” patch to make GH:WT drums with Rock Band. Don’t make a fuss over it.
Better songs from bigger names: Not the deciding factor, but they count. I myself am prepared to switch sides if the competitors offer Pearl Jam’s album “Ten“.
Third Party, lots of third party: You have an “open platform philosophy,” right? You’re doingstuff with it, and that’s good. Now you need to do a lot more. Open the game up for plug-ins, get more hardware companies interested, flood the market with Rock Band tools. You’ll have your cheap, entry-level instruments (yes, these are the ones you’re already selling) and different sets for all price ranges.
There is a huge difference between compiled languages and scripting languages: the former are first entirely compiled and then executed, the latter are interpreted as they progress. I will use this terminology to discuss two types of protocols: the more common is made up of readymade messages or methods; each message type indicates a predetermined course of action. The less common type, and one I think deserves more attention, sends messages made up of many simple commands that do very simple things.
I will illustrate the script protocols using MEGACO (recently revived as it was added to IMS). MEGACO is a protocol between a Media Gateway (MG) to a Media Gateway Controller (MGC). The MGC handles call establishment and routing, the MG handles media exchange, something like a central switchboard and a residential gateway. The MG is a “dumb” entity, which receives basic instructions from its MGC, and acts on them to the best of its ability. The instructions come as nested instructions: for these contexts, for these terminations (or for terminations with these properties), perform these actions. The actions may be “Add to context”, “Remove from context”, “Move between contexts”, “Service change”, “Modify attribute”, “Query attribute”. What we have is a set of commands with their syntax, the ability to create for-loops, conditional expressions, in short, everything you need for a programming language. The MGC programs the MG in real-time to enable calls, conferences, special services and reports generation. Through “packages”, the MEGACO scripts have expanded to include timed actions, interactive menus and even pre-programmed responses to simple events.
In the MEGACO world, the MGC is a bilingual entity; it speaks some other VoIP protocol, H.323 or SIP, for instance, but does not process the protocol’s media streams. Instead, it programs the MG with the media information and leaves any needed transcoding and mixing to the MG. The MG is dedicated to media streaming and processing, and so, it is designed to have simple state machines and no logic - it just does what it’s told. This architecture achieves three goals:
Compatibility
Flexibility
Power
Compatibility
Since the implementation relies on execution of simple commands, there is not a lot of room for interpretation. How can one implementation of “Add fax to context 1″ be much different from another? One MG will respond much like another MG, with only little variations, making the protocol reliable. The protocol allows the MGC to get the list of supported packages form the MG, and the MG does nothing without being instructed to by the MGC, so the two will only use the extensions common to both implementations.
Flexibility
Adding a package is relatively simple, allowing the protocol to become very versatile, adapting to handle any device type (e.g. fax, ip-phone, set-top box, etc.). The use of very basic commands allows very fine-tune control of the way a message is executed. Subtle, incremental changes creates new approaches, all without changes in the protocol. This protocol, in fact, evolves and grows through usage.
Power
Combining the simple commands creates powerful scripts, is made more powerful by the versatility and abundance of packages. The MG needs only to do media - it is built and optimized to the extreme point of specialization in media, and provides whatever kind of a service in the MGC through the protocol which is in fact an “API” in front of the MG which is simple on one hand and powerful on the other. It’s similar to providing a platform on the web and letting people develop their applications (MGCs) on top of it.
However, MEGACO is not without its flaws. Some packages require changes in the interpreter. Some powerful features were only added in version 3 for the standard, meaning that you can’t rely on MGs to support them, and some are still missing. Sometimes packages are added to compensate for missing basic commands (like a nested set of commands to execute on an event), making the implementation awkward. The basic set of commands could have been reworked, why have “add” and “delete” where “move” could substitute both?
The real question is, can script protocols be used in different scenarios? MEGACO is used between a defined “master” and a “slave”, could a script protocol be used between peers? Would such an implementation be powerful or just complicated? I believe it can be done, but it would require a very carefully designed set of commands.
In the standardized human behavior series, I discuss human behavior, compare it to a protocol and see what we can alter or learn from it. Previously I considered whether people want to be standardized at all and concluded that they do not. Nobody wants to think of himself as a puppet manipulated by external forces or to act just as is expected of him. It’s possible to take the “Soup Nazi” episode from the Seinfeld sitcom as an example. The insistence for protocol in the episode is so outrageous it’s hilarious, that’s why we enjoy Elaine’s backlash so much. We don’t want to follow the “ordering procedure” and we don’t want move immediately to our right as we walk into a room.
Case Study: Soup Nazi
What the soup-selling-person-who-is-extremely-procedure-oriented had there is a protocol. It was a lenient protocol to some extent, but it had a penalty for stepping out of boundaries: a denial of service for a period. Since following the protocol had such a reward for it - a soup so good to make your knees buckle - most people agreed to follow the protocol. The rate of following the protocol is in direct proportion to the rewards and penalties associated with the protocol.
Lesson Learnt?
Humans refuse to follow such protocols without a set of rewards and penalties associated with them. Maybe what protocol programmers need are just better incentives (in the form of soup?) but perhaps I’m just over-analyzing it.
Checking the searches that lead people to this blog revealed that they search for “Radvision INOUT”. Thus, on public demand, I will explain the super-secret-non-more-secret parameter guidelines. From RADVISION’s common type definitions:
/* Some "empty" definitions that we can use for readability of the code */
#define IN
#define OUT
#define INOUT
What this simply means is these words are replaced by nothing; they are just indications for the reader, not the compiler. I think mini-comments may be the best description - they are there to help whoever is going over the list of parameters stay alert as to which parameters will be manipulated during the execution of the command. In contrast to comments and documentation, these additions are usually considered as part of the prototype by assistance programs, which means they are displayed as a tip while you write.
To sum up: since writing an SDK or even an internal layer or module, sometimes you want to indicate the usage of the parameters passed. One way to do it is to declare input as “const” and the rest will be considered output, but I will show that the hybrid INOUT has important uses as well.
An example, from the H.323 protocol stack:
RVAPI RvStatus RVCALLCONV cmProtConnNew(
IN HAPP hApp,
IN cmProtocol protocol,
IN RvBool bMultiServer,
IN hostEventHandler eventHandler,
IN void * context,
IN cmTransportAddress * remoteAddr,
INOUT cmTransportAddress * localAddr,
OUT HPROTCONN * hConn);
The first few parameters are parameters are used as input to the API. The last parameter is used to place the output of the function in (the function returns a status code, and if successful, the output parameter will be set). The second to last parameter is both an input to the function and an output, it is set by the API caller to some value, this value is used by the API and modified if the API is successful. In the example above, the user sets the local address, and may set the IP or port to zero if he wants the OS to decide, on exit, the local address will be set to the real address used. Other uses may be:
Passing a buffer length and getting the used or needed length
Passing partial information (aka hints) and receiving full information
Translating the contents to another format
Replacing the content as needed (e.g. NAT, DNS)
Status, state or location updates (e.g. when navigating a data structure)
In order to destruct an object and invalidate the pointer to it in a single step
Another search term which led people to the blog was “I’m naughty I need spanking”. However, there is a limit to crowd-pleasing.
In my previous post, I mentioned the “wicked son,” the vendors who want to give their customers a sense of security, but do not actually want to implement any cumbersome security algorithms. I had a customer using H.323 who sent me specifications for a security implementation for H.323 where the password wasn’t known in advance, and asked us to support it. When I mentioned to them that they were showing the password in the open, where anyone who wants can simply catch the “Setup” message and read it, they answered that no, they were not, they were sending the hashed password to the other side. That was even worse, as they were saving eavesdroppers the bother of finding out which hash was being used, and they could immediately use the sent buffer to decrypt the media.
Users are usually of the “simple son” type, they want a sense of privacy. They want to feel that their web mail, their online files and their VoIP conversations are protected from casual observation. Usually, they do not expect someone to plant bugs in their home or deliberately crack their email password, so a closed door with a “do not disturb” sign should be enough. They want to be asked for a user name and password, and once they are, they feel secure. The fact that the password was passed free for all to see, and that the media may or may not have been encrypted is of little concern to them - this is all technical stuff, let the technical people deal with it. Stranger still, most of the time this works. If something is locked, or appears to be locked, people will hesitate before tampering with it. Not using security at all is like talking in public - anyone around can listen in, but it’s not all that polite. Using some sort of basic security is like talking quietly somewhere away from the crowd - anyone who really wants to can listen in, but it’s embarrassing for him to intrude. Using real security is like speaking in a room behind closed doors with concrete walls with acoustic isolation; safe, but hardly anyone will take the trouble. I may ask, “What’s the harm in that? RFC 2617 explains:
The Basic authentication scheme is not a secure method of user authentication, nor does it in any way protect the entity, which is transmitted in cleartext across the physical network used as the carrier. […] It SHOULD NOT be used (without enhancements) to protect sensitive or valuable information. […] The danger arises because naive users frequently reuse a single password to avoid the task of maintaining multiple passwords. […] In the server’s password database, many of the passwords may also be users’ passwords for other sites. […] Basic Authentication is also vulnerable to spoofing by counterfeit servers. If a user can be led to believe that he is connecting to a host containing information protected by Basic authentication when, in fact, he is connecting to a hostile server or gateway, then the attacker can request a password, store it for later use, and feign an error.
So not only do the basic forms of authentication lull the users into a false sense of security, they are dangerous to their real security.
Recently I was asked to add the basic authentication to our RTSP stack, which supports the digest authentication. After refusing, the customers added basic authentication by themselves. I couldn’t help but be reminded of the book “What to Eat”. In the book, Marion Nestle resents vitamin enriched cereals and candy - the claims as to the health benefits of the additives, she writes, are there just to make us forget the (high) caloric value of the product. When a video terminal asks us for a user name and password, we are led to assume that there is some form of security involved, just as the announcement to vitamins and iron added to breakfast cereals makes us believe that the manufacturer has some consideration to our health.
We should be more aware about what we eat, so should we also be more aware of how we protect ourselves.
On the Jewish holiday of Passover, we read about the tale of the four sons: One wise, one wicked, one simple and one who does not know how to ask a question. In this special Passover post, I will consider their approach to security.What does the wise son say? For the wise son, no amount of security is enough. He will use authentication, integrity and privacy algorithms to protect his online data and communications, although he knows that all measures of security are theoretically breakable. He will check exactly how strong is the encryption on any product he buys, and as a vendor, he will ship products so secure that only a super-computer can actually run them.
What does the wicked son say? The wicked son uses security, and expects products he buys to be secure, but does not really care about security in the products he sells. He knows others are interested in security, so he will implement some security-like mechanisms, usually something that actually sends the user name and password as clear text over the net.
What does the simple son say? He knows security is important, and he knows a thing or two about it, but he realizes that if anyone ever really wanted to break through his security, he could. So he sets up his security to thwart the casual nose poker, and in the products he sells he implements a simple security scheme to authenticate the user and give his some degree of privacy.
And the one who does not know how to ask? He doesn’t know anything about security, and does not think it’s really needed. People should mind their own business, as he does, and they shouldn’t go where they are not invited.
Security is a controversial issue in protocols. You can find the whole range of opinions. The four sons above represent the four main approaches to the issue. While the one who does not ask, the simple son and the wise son form a spectrum of approaches, the wicked son is outside the spectrum. The wicked son mimics the approach of the simple son (or sometimes even the wise son) but in fact causes more harm than good. I will explain why in my next post.
At times, I like to keep scores between Development and Customer Support (CS). If a problem is on the customer’s side, a point is awarded to development and if the problem is with our code, a point is awarded to customer support. Then there are the many special cases, for example problems with documentation (points for CS), problems with API design (more points for CS), problems fixed for another customer already (points for Development) problems which are already fixed in later versions (points for Development) and problems with API misuse (more points for Development). Extra points are received for quick responses and this is where good logs and handy scripts are put to good use.Good logs are essential. You must be able to debug your code and you must be able to do so remotely. People who are not used to writing programs that are used inside other programs may sometimes depend on error codes or user descriptions to discover what went wrong. However when you are developing a library, you must have good logs. For example, things could have started going wrong long before the actual error or exception. Most of the time, you have to go to the beginning of the session to track an object from its creation, look through its state changes, and then find out what event in it’s past caused the error. At times, you need to track the stack conditions from initialization to find out what went wrong.
Such cases are due to the occasional resource leak. This is where you need logs to tell you when an object was allocated and when it was deleted. If the object has access control, the logs also need to report when it is locked and unlocked. This adds up to a lot of information which is too much to cover manually. Here is where you need to master a scripting language. It’s not an absolute necessity; it’s just the quickest way to analyze logs. When you need to do text processing and regular expressions, scripts are the way to go. Here at RADVISION we use TCL scripts, although Perl and Python are options that are more common.
The final step is to use C Macros to print the file and line of the calling function, like so:
This means that the function will be able to print the exact location where it was called to the log. Your well-written scripts will be able to direct you to the exact location of the problem, that is, if the problem is indeed in your code. Sometimes the scripts direct you to a filename that is in the client’s application.
Here is where you quickly press the “reply” button and send the customer to D:\coolProject\videoApp\myFile.c, line 845, to check why he/she is not releasing the resource allocated there. Result: Ownage.
Gabe Wachob provides tips for API developers. In contrast to my low-level approach, Wachob looks at API design from a holistic perspective. Although he speaks on API design for web services, most of his tips are relevant to every aspect of API design. For example:
API can be seen as a separate product you are delivering
One can document the service and not just the API
Provide a reference client application to demonstrate usage of the API
The one that I am most fond of involves developing against your API “fun and personal”. Wachob links another post explaining that last point, in which he writes:
“If you are a developer, you know what the thrill of the hack is - when your building something, and you sit down and implement a new feature and all of a sudden, your stuff plugs into a bunch of other people’s stuff and what was once a cool standalone thing is now part of an ecosystem of interoperating cool stuff. The whole becomes greater than the sum of the parts. And you, the developer, are part of it.”
Here Wachob references web services, developer communities and open source development. I need to think about how this translates to the design of product APIs and protocol stacks and I’m open for new ideas.