People often use the term “real-time video”, but it could me different things to different people. Some regard real-time video as the kind you stream, others as the kind you use for visual communication. The difference may seem insignificant, but it’s all that is needed to botch a real- time video project.
The Kind You Stream
The streaming kind is uni-directional – the video in this case flows from a single source to one or more destinations. As the delay between sending and receiving is not relevant, real-time in this case is a matter of seconds or even minutes between send and receive.
In some cases, you’d like something better than a matter of seconds, but mostly seconds would do just fine.
The Kind You Use For Visual Communication
Visual communication is bi-directional – the video in this case flows in both directions and can be considered “synchronized” – the video stream from one side responds and interacts with the video stream from the other side. In this case, real-time means hundreds of milliseconds difference between send and receive at most.
It’s the Latency That Counts
Latency is the term used for the time it takes for live video to be captured, encoded, sent across the network, decoded on the other side and displayed. This variance in “real-timeness” between streaming and visual communication is the cause for a lot of “disconnects” between codec vendors and their customers from the visual communication industry.
I’ve been involved in several different projects that required various codecs for video telephony. Our customers who develop the end products are usually the ones choosing the codec vendor, and usually they do that prior to selecting a vendor for the signaling, call control and application parts (that’s us). While this is all fair play, time and again we bumped into codec vendors with a great brand name for their products, excellent video quality in off-line testing, a great deal of knowledge in optimization. Yet, they were not suitable for use in a video telephony system – they referred to real-time video as the kind you stream, not the kind you use for visual communication.
This is not a big surprise. Up until now, video streaming and playback were the majority of the industry. Codec vendors focused on those markets and neglected the visual communication requirements and needs. The end result? Delays in product deliveries, bad vibes between parties, replacement of codec vendors in the middle of projects, etc.
What to do?
If you are working on a video telephony project and need to deal with codec vendors, try to make sure they fit your needs. This must include a set of requirements the codec vendor has to follow. On this list of requirements you should put the following items (at least):
- The encoder should be able to generate slices (parts of a video frame) in any arbitrary size. This allows better management of packet sending over the network, and also improves packet loss resiliency and reduces latency.
- The encoder should work well in an IPPP structure, where the key frame (I) is sent only upon request and not at a fixed frame interval (as in video streaming).
- The decoder should be able to receive packets of any given size (not just full frames), to allow faster decoding (and lower latency).
- The decoder should be prone to network errors, such as packet loss, and should not crash on any input what-so-ever.
- As the encoder and decoder work in parallel in video telephony, it should be possible to control the encoder at frame level. For instance, to request a key frame (a procedure known as “Video Fast Update request” or “VFU request”).
- Make sure the rate control mechanism built into the encoder you plan to use works in a constant bit rate (CBR) mode. Anything else just isn’t suitable for video telephony.
Up until recently, video telephony was a niche market, and a small one at that. This is why most codec vendors still don’t really grok telephony. The good news is that video is going mainstream. With Skype and Google both having solid video telephony solutions, it will be everywhere soon enough, which means codec vendors will have to do their homework.

Comments and trackbacks
1. Fedrick - video frames | December 19th, 2008 at 9:57 am
It rally good to know about real-time video as the kind you stream, others as the kind you use for visual communication.The encoder generate slices ) in any arbitrary size. This allows better management of packet sending over the network, and also improves packet loss resiliency and reduces latency.
2. Tsahi Levent-Levi | December 21st, 2008 at 1:38 pm
Frederic, thanks for the comment.
The video communication one is harder to work with because of the need of better resiliency and lower latency. This is also why there are these differences between streaming and real-time video telephony.
Trackback this post | Subscribe to the comments via RSS Feed