“What are your thoughts on this tool and how it might assist moving video conferencing forward?”
I promised Bryan I will dedicate a post to his question, and here I am keeping that promise in another edition of “Ask The EXPERT” – - -
Vidyo’s technology is interesting. It definitely shows the potential that still lies in H.264 tools, including Scalable Video Coding (SVC), for video applications such as video conferencing. I believe that the media attention that SVC is getting will improve the video conferencing market, as it will help vendors introduce more tools that will improve the overall quality of experience.
The Long Answer
SVC , the latest extension to the popular H.264 video coding standard, has been getting a lot of media attention. But one should know that Scalable Coding is not a new concept. The idea of sending a single bitstream which will fit different receiver capabilities ( in terms of frame size, frame rate, bandwidth and computational complexity) has been charming video coding researchers and video application manufacturers as wired and wireless networks evolved over the past 15 years, .
As video is sent to several receivers with their own processing capabilities, several problems arise for the aggregator of the content (the MCU, in video conferencing applications, or the streaming server, in a streaming application). It can either settle for the lowest common denominator and ruin the experience for all, or it can try to fit the stream for each participant separately, spending huge amount of resources and bandwidth.
SVC attempts to solve this issue by offering a single stream that is built like an onion – each participant can peel the layers of the stream up to the point where it feels comfortable with the result (meaning, it can process the layers that are left). In this way, the “base layer” of an SVC stream, the core of the onion, can include support for the lowest resolution (CIF, for instance), and the outer layers will include information that allows to scale-up the resolution gradually to, for instance, 4CIF (1st layer), 720p (2nd layer) and 1080p (3rd layer).
Example: SVC encoder with multiple receivers.
Any network component can then choose to process any set of layers, yielding the different resolution(s) it chooses. In a similar way, layers can increase the frame rate (number of frames in the stream), the bit rate ,or the quality (base layer has low quality, higher layers improve the quality gradually).
All major video coding standards since 1994 have included tools for scalable coding (MPEG-2, H.263 V2. MPEG-4). In fact, I can still remember the big hype around MPEG-4′s scalable video coding tools, which offered a brand new disruptive alternative to the traditional coding schemes of the time (see part 8 of this interesting pdf MPEG-4 overview).
But not all of these tools were eventually accepted by the video applications market, even though the standards themselves were, mainly due to the tremendous additional cost in terms of bit rate and computation. Although it was appealing to send just one stream, instead of multiple streams, from a streaming server to different clients, the overall bit rate of that one stream was close to the aggregate total of the different streams, and/or the complexity and cost in computations was close. Therefore, the solution didn’t stick.
Scalable Video Coding (SVC), the extension of the H.264 standard, has been developed since October 2003 by the Moving Picture Experts Group (MPEG) at ISO/IEC. In January 2005 MPEG and the Video Coding Experts Group (VCEG) at ITU-T agreed to standardize SVC as an amendment to the H.264 standard. In July 2007, this amendment got its final approval.
The scalability extension of H.264 offers Spatial Scalability (frame size adaption), Temporal Scalability (frame rate adaption) and Fidelity Scalability (quality adaption). It also provides a great boost in error resiliency and concealment, which helps prevent errors in the bitstream and to recover from them gracefully.
Many applications, such as video streaming, surveillance, broadcast, storage, may potentially decide to adopt SVC -. For video conferencing SVC offers two potential benefits:
- Compatibility among different endpoints, from desktop to conference room, as the variety of their capabilities is a great challenge to current MCUs.
- Greater error resiliency, as even modern networks still introduce many artifacts to the video, which hurt the quality of experience tremendously.
The main question here is therefore whether or not these benefits will drive SVC into the video conferencing market.
Adaption to Different Endpoint Capabilities
As a general idea, adaption to different endpoint capabilities does sound great. The main problem is interoperability. If your network is, as RADVISION believes, to be the Babel Fish of all video conferencing endpoints, encompassing a full range of products from the mobile handset, through the desktop and up to HD video conferencing and Telepresence, then the range of video resolutions, frame rates and bitrates that you have to support makes the use of SVC very complicated, especially if you also need to support non-SVC (legacy) endpoints.
That is because in SVC the world is divided into layers. If you want your stream to include both CIF and 720p, you need two layers. If you want other resolutions as well, you need even more layers. Note that the combination of resolutions, the bitrates, the frame rates and their various combinations, becomes quite high, and the overhead in bit rate and complexity makes SVC less appealing.
One may argue that we will see hybrids between SVC and AVC (the more common H.264 coding standard), which will offer gateways that will transcode between the SVC world and the AVC world. But then again, you build a world of isolated islands, which – just like with Telepresence – is something we’ll have to deal with, sooner or later.
SVC for Error Resiliency
The use of SVC or SVC-like techniques to improve error resiliency and error concealment is, IMHO, the biggest short term benefit the video conferencing world will see from all the hype concerning this technology.
Error resiliency schemes and tools already exist in H.264 AVC (pdf), but were mostly not used in the video conferencing domain. I believe that, as Bryan wisely put it, SVC will assist video conferencing to move forward by giving these tools the spotlight.
Error concealment in H.264. Source: Signal Processing group, University of Bristol.
With SVC being introduced, and companies like Vidyo pushing it hard and marketing their proprietary error concealment capabilities, I believe that the focus of video conferencing vendors will change. More error resiliency tools, but not necessarily the SVC tools, will be used and that will improve the overall experience for all of us.
I strongly believe in the value of Scalable Video technologies for improving error resiliency. However, despite all my tech guy affection for revolutions, instead of a dramatic “phase transition” into SVC, I think that the video conferencing market should evolve gradually into scalable technologies.
SVC is intriguing. I might dedicate a post or two for this technology in the near future. But I believe the video conferencing market will not move entirely to SVC. Instead, it will adopt ideas and technologies into its fine foundations, making the video conferencing experience better.