I’ve been getting a lot of questions from various places that deal with how video conferencing looks and feels, and mainly inquire about the difference between CP and VAS.
So – What do CP and VAS mean, and what do they have to do with video conferencing?
VAS
VAS is short for Voice Activated Switching. This basically means that the active speaker, the term for the meeting participant that is now talking, will be seen by all meeting participants whenever she (or he) is talking.
This means that during a call of let’s say 4 participants – David, James, Susan and Heather – if David is now speaking, everyone will see David on the screen:

If, for instance, James will now start talking, he will replace David on the screen:

In more advanced systems James might see David (the “last active speaker”), while everyone else will see James. This is known as a “no self-see” option, where a participant will not see herself, which some see as a physiological downside.
A voice activity detection algorithm (VAD for short) is usually used to determine which of the meeting participant is currently speaking, and sometimes another component (mixer) takes care of comparing a few “talking” participants and deciding who is the “active” one.
CP
CP is short for Continuous Presence. This basically means that all participants, or a subset of these participants, including the active speaker, are seen on the screen at the same time.
In our example above this means that all 4 participants may be seen on screen:

Left: a 2×2 CP layout. Right: a 1+3 CP layout.
CP also involves identifying the active speaker, as in some cases (like the 1+3 layout above) the “active speaker” will move to a designated sub-frame, be shown in a larger sub-frame or any other graphical way to emphasize they’re the one who’s talking.
CP involves a lot of graphical manipulations on the streams received by the different endpoints, including scaling and mixing the images. Today, with SVC technology, CP can be achieved without graphical manipulations, but only in closed H.264-based deployments, which are still not common.
VAS, CP and Video Conferencing
CP is basically the evolution of VAS when it comes to video conferencing. A while back, when there was not enough CPU to do the graphical manipulations necessary for CP, and when the screen size was too small for a CP layout, VAS was the best solution. In fact, if you look at what most UC clients are doing with conferencing, they are still using VAS as their means of displaying multiple participants in a conference scenario.
With the introduction of larger screens and greater conferencing platforms, CP became not only relevant but very popular. On today’s HD screens the latest RADVISION Elite MCU can show up to 28 participants in one high definition layout, and the result is quite amazing. As you can see below:

On the other hand, as video conferencing is making its way to desktops, mobile handsets and other handheld devices, and as everything goes to the cloud, CPU power is again a relevant factor, and I believe VAS can, and should, provide a viable and cost-effective mode of operations.
So what should you use for your video conferencing service, VAS or CP? It all depends, I guess, on the type of devices and the type of infrastructure that you want to roll out, and the exact user interface you want to provide. As explained above both make for great user experience with excellent market-proven examples out there.




A vendor recently told me that there is research to support the fact that people cannot process more than 8 participants on the screen at the same time. Do you think that they are saying this because their product is limited or do you think that this is actually the case?
Hi Jamie,
While I am not aware of such research, I can truly sympathize with the findings.
As someone who’s product is not limited (as the image above shows), I think that in many cases a big CP layout is an overshoot, and every case should be analyzed and CP or VAS should be selected based on the target and cost.
- Sagee.
another consideration…. if you have large enough screens you can use VAS to maintain lifesize images… this is great if you are sitting directly opposite the screen to emulate a “same room” experience. Ultimately there is a trade off on screen space and life size views in a CP environment when the number of participants gets greater than the screen real estate…. this might not sound like a big issue, but when you experience a life size view you really can start to forget you are talking to someone across video.
Richard,
I totally agree with you. It’s a road taken by some of the vendors (like Cisco in their Telepresence products).
As you mentioned, there’s a trade-off, and I guess the decision should be – as I wrote – by product and by purpose, according to priorities.
- Sagee.