[We've just announced our Samsung-RADVISION VC240 HD videoconferencing LCD monitor and our BEEHD videophone engine. What we've noticed is that customers tend to ask a lot about our video codec in these products. Amit Klir, our "resident media specialist" for BEEHD explains what makes the video codec in our client product so great]
As when giving a gift, customizing it makes the gift so much more effective: the same hold true with video codecs. When looking for one, you need to think of what it is going to be used for. You need to make sure that the codec you choose is going to optimize the task at hand. The best of video codecs are application-dependent, allowing you to get the most bang for your buck.
Amit Lavi, the product manager of the BEEHD, asked me to write a bit about what it is that makes our video codec perfect for visual communications purposes. Here’s the list that I came up with (though there are more benefits than those enumerated here):
Visual communication happen in real time. In order to send an encoded video stream that is coming at you at 30 frames per second, through a remote video stream it has to follow real-time constraints by encoding each one of the frames in 1/30 of a second. If It doesn’t, you’re going to either lose frames or overflow. Doing all of this in high definition is hard- really hard 108,000 macroblocks have to be processed each second in order for high definition real time visual communication to happen. That’s a lot.
When talking about our embedded BEEHD product, we are using the TI DM6467 processor. This processor includes multiple hardware accelerators for H.264 coding. While the DM6467 is a great multimedia processor, it is quite complex, as it packs together a co-processor, a DSP, a DMA engine and memory cache. It was important for us to use the DM6467 efficiently, so we took the time to design the codec in a way that will take advantage of all these hardware components.
While real-time constraints are important but there’s also the issue of latency. Have you ever seen live broadcastings of people communicating from different countries? The anchor asks a question, and then waits for a few seconds until the reporter “on the ground” hears the question on his side. This annoying pause is what latency is all about.
When talking about visual communications, you want the latency to be as low as possible. The rule of thumb is that latencies of over 200 milliseconds are annoying.
In visual communication systems, the path the video stream has to go through is quite long, and each of the tasks along the path may affect latency. The main cause for latency is the codec’s processing time and the network.
An H.264 HD video encoder yields frames that are too big to send over the network. This is why an H.264 frame used in visual communication is split into slices. Codecs will usually work on the frame level requiring the media system to split the encoded frames into slices before sending them and then compounding the sliced network packets back into frames before decoding them.
In order to reduce the latency as much as possible, our codec encodes and decodes the video on the slice level instead of the frame level, giving us several advantages as it removes the need for the media system to deal with frame fragmentation and it enables us to send packets before a full frame is encoded thereby speeding up the process and reducing latency.
~280 milliseconds end-to-end latency in BEEHD
* over 140 milliseconds attributed to an external off-the-shelf camcorder)
There are two common rate control techniques in video coding: VBR and CBR. In real-time visual communications, only CBR is acceptable, as the nature of VBR will cause bursts in bandwidth utilization, leading to delay and packet loss.
CBR vs VBR
CBR means that the “budget” of bits for a constant period of time should not vary. For visual communications, that period of time is usually fixed to a short time period – about a second. In such cases, it is important that the granularity in which the rate control mechanism used will be as small as possible, in order to maintain the video quality in the encoder.
While some encoders use frame level granularity (3600 macroblocks for HD), our codec’s basic unit for rate control is several macroblocks in size, which is why we are able to distribute our “budgeted” bits efficiently without losing quality.
Rate Distortion Optimization
This one is not specific to visual communication and makes sense to any video application. When encoding video, you make compromises; you lose data for the sake of being able to compress and send it over the network. Under a constraint of a given rate, you apply some distortion to the video you capture and encode. The biggest issue is how to optimize the captured video so that the distortion will be as low as possible.
The H.264 standard provides many (some say too many) modes allowing you to encode a macroblock. Picking the best mode means getting the best quality over a given rate. Only problem is – selecting the right one requires a lot of computation that needs to happen in real-time.
Putting is simply, our codec is rate-distortion optimized.
While the issues above make our codec an excellent choice for visual communications, we do have an added bonus: we’ve got SVC support in it as well, making it robust against network losses – and that’s what really makes all the difference in the world.