[I'm discussing video conferencing here on a regular basis, but it seems as I have neglected a very important building block of the solution: data collaboration. In order to make amends I asked Sasha Ruditsky, TBU's CTO and a video conferencing veteran, to write an introduction for video collaboration in video conferencing.]
- People want to be able to hear each other
- People want to be able to see each other
- People want to share documents and work on them collaboratively.
Quite a lot is said and written already about audio and video communication, so I’d like to focus on the data collaboration aspect of the conferencing.
In its simplest form data collaboration is the ability of a conference participant to present content to the rest of the conference participants. More advanced form of the collaboration involves the ability of all the participants to perform different actions on the shared content (such as annotation, editing, etc.).
For people to be able to work together, the devices they are using for communication need to speak the “same language”. When multiple vendors provide conferencing equipment, standardization is required to achieve interoperability. As far back as in the beginning of 1990s, ITU-T started to work on an “umbrella” of data collaboration standards, which is known today as T.120. Different parts of T.120 were published by ITU-T between 1993 and 1995.
One of the T.120 design goals was the creation a self-sufficient set of data collaboration tools, which may be used either as standalone or in the content of a multipoint videoconference (then over ISDN).
The requirement for a standalone solution lead to a full-fledged network protocol abstraction (T.123), an information delivery means for multipoint content (T.122 and T.125), and a set of conference control tools, such as conference establishment and termination, conference and application rosters, application registry, conference conductorship, etc. (T.124).
This stack of protocols allowed independent vendors and standardization organizations to develop multipoint data collaboration applications on top of it. ITU-T defined several such applications, such as the -Multipoint still image and annotation protocol (T.126), the Multipoint application sharing (T.128) and others.
T.120 was successfully implemented in such products as Cisco WebEx‘s MeetingCenter, Microsoft NetMeeting, Nortel CS 2100 and Lotus Sametime. While it provides rather advanced features, it also comes with great complexity and found only relatively limited use.
Actually, in most cases, all that is required from a collaboration function is:
- The ability to exchange text messages between the conference participants (chat)
- The ability of a particular conference participant to present a document to the rest of the participants (data presentation)
Dual video on a RADVISION SCOPIA MCU system: on the left – data; on the right – the conference.
Let’s focus here on the second item: apparently it is possible to stream the video capture of a document displayed on one’s desktop to the conference participants using the tools designed for “regular” video communication. As the quality of video communication was gradually increasing, the idea of using a second video stream as means for data collaboration was becoming more and more viable.
Several reasons exist for delivering multiple video streams between video conference participants. Some systems use separate cameras for individual people located physically on the same site (Telepresence, for instance). In other cases the image of the same person from different view angles is delivered, to present different angles to different remote participants and as a result maintain eye contact.
However, what is discussed here is so called “dual video” scheme in which one video stream, which is also called the “main” or “live” video stream, is used for delivery of the video stream of the conference participant and the second video stream, which is also called “presentation” or “document camera” stream is used for delivering the capture of the document being presented.
In addition to allowing two independent video streams in the same conference several more problems need to be resolved to allow “dual video” functionality:
- Association: There should be a mechanism to associate the two video streams with the type of the content they are delivering, i.e. to determine which one is delivering live video and which is delivering the presentation.
- Chair Control: There should be a mechanism for the participants to decide which one of them has the current right to present and how this right is transferred between the participants.
Dual Video in H.323
H.323 went by the road of defining a completely separate Recommendation, H.239, designed solely for the purpose of implanting of the “dual video” functionality. From version one, H.323 provided the ability to establish multiple channels of the same media type and to specify the supported combinations of such channels. H.239 added to H.323 the ability to signal the “role” of each video channel, i.e. is this “live” or “presentation” video. H.239 also defined the way to create and manage the presentation token: the conference participant currently possessing the token is the one who is allowed to present.
Dual Video in SIP
On the other hand, SIP, or to be more precise IETF, defined all the necessary building blocks which when used together were suppose to provide the “dual video” functionality. In SIP it is possible to use several SDP video media lines to signal support for multiple video streams. The “content” attribute defined in the RFC 4796 is semantically similar to the role parameter in H.239 and applying different content values to the different video media lines provides a simple way to distinguish between them.
IETF also defined in RFC 4582 the Binary Floor Control Protocol (BFCP), which allows control of a conference floor. A floor is a shared conference resource, which is available only to a single participant at a time. If the ability to present is considered such a resource, then it is possible to use BFCP to control the presentation token. In contrast to H.239, BFCP and video streams in SIP are completely unrelated, so a separated mechanism is needed to associate between them.
RFC 4574 defines a label attribute, which allows “stamping” the SDP media lines with labels. It is then possible to refer to these media lines from other places in SDP by using the labels. BFCP media line is using this mechanism to define which video stream is controlled by the BFCP floor control mechanisms.
So SIP got all the building blocks necessary to implement the “dual video” functionality. The problem, however, is that no document defined how all these separate pieces suppose to work together. As a result videoconferencing equipment manufacturers were open to create proprietary interpretation of the protocol semantics. There exist today several variants of standardized, non-interoperable implementations of “dual video” over SIP.
An additional problem is that SIP represents the requirement of backwards compatibility. The single video stream SIP endpoints need to work correctly against the “dual video” endpoints. Unfortunately, the specification of the behavior of the single video endpoint, when it receives the SDP with multiple video streams proved to be rather ambiguous and any straightforward implementations of the dual video endpoints inevitably causes inconsistent behavior of older equipment.
These and similar issues existing today in SIP prompted the creation of the IMTC SIP parity group. The goal of this group is to create specifications detailing the best current practices of the behavior of the SIP entities which would allow both interoperability and backwards compatibility. One of the subgroups of the SIP parity group is dedicated to multiple video streams applications. Several versions of the best practices document are already produced and discussed, and the chances are that the document will be finalized soon. With the completion of this work we should expect more interoperable dual video SIP based conferencing systems to appear on the market.
Teliris’ TouchTable at InfoComm09 – the future of data collaboration?! (HSL)
Web Conferencing in the Future
Both T.120 and “dual video” are based on the idea of the distribution of an image (of the application or presentation) created on the endpoint of one of the participants to the rest of the conference participants. Such mechanism eliminates the requirement of running the shared application on all the endpoints. For a conference which involves heterogeneous systems with diverse operating systems and hardware this is quite a significant advantage. On the other hand, exchanging images involves transfer of large amounts of information between the conference participants and losing the metadata of the actual content along the way. It would be definitely more efficient to “tell” all the participants to switch to slide 17 then to send them capture of the content of slide 17.
The alternative of sharing an application running on one of the participants’ computer is to run an application on a web server. While this approach is much less generic, it is also significantly more efficient. Recent advances in HTML specifications (HTML5) support of the latest HTML specification by all major browsers and the availability of such browsers on a wide variety of devices make this alternative quite appealing. Just have a look at Google Wave. It is very likely that the future of data collaboration is laying there.