[The world is moving to video, so if you're dealing with VoIP communications and video is on your radar, there are some terms you probably need to know. I have asked Amit Klir, who is one of our video experts to compile a short list of the most essential terms in video coding.]
As video over IP is becoming commonplace, more and more people are becoming involved with video applications development, integration, deployment and administration. As complicated as video coding is, it is not unmanageable. If it is your first step into video compression technology, you might find the terms I’ve collected below useful.
Video Encoder: Software or hardware device that enables video compression. Generally, compression is used to reduce the size of the visual content, either for storage purposes or for streaming over a network channel (reduce bit rate). Video encoder performance and quality is being determined by the encoder complexity.
Video Decoder: Software or hardware device that enables video decompression. In general, a video decoder is used to reconstruct the video content from compressed data into a visible displayed format. For real time streaming network applications, the decoder is used to convert video packets sent over the network into video frames which can be displayed on screen.
Bit rate: Rate of bits transmitted over a particular period of time on a specific channel. In video coding applications, video bit rate is determined by the number of the used bits per one second. For example: 1Mbps = 1Megabit (1 Million bits) per second.
Frame Rate (fps): Rate of frames used in one second of video stream.
Frame resolution: A term defining the size of the basic element of video content – the frame. Frame resolution describes the number of pixels on the horizontal and vertical axis of a video frame. There are several predefined popular acronyms for frame resolutions: CIF – 352×288, 4CIF – 704×576, D1 – 720×480 (NTSC) or 720×576 (PAL), 720p – 1280×720.
PAL: A term used to describe playback video on a PAL TV. In general, PAL refers to standard definition (SD) video with vertical resolution of up to 576 pixels and horizontal resolution of up to 720 pixels. PAL frame rate is 25 fps. PAL broadcasting can be found in Western Europe countries, Australia, various countries of South America and assorted Asian countries.
NTSC: A term used to describe a playback video on a NTSC TV. NTSC generally includes standard definition (SD) video with vertical resolution of up to 480 pixels and horizontal resolution of up to 720 pixels. NTSC frame rate is 29.97 fps. NTSC is used in United States, Canada, Japan, and various Asian countries.
High Definition: Usually refers to frame resolutions of 720p and up.
Frame Types: In video coding, there are several common frame types. I or Intra frame is a frame that is coded independently of any other frame, using only spatial redundancies for prediction and coding. An I-frame uses relatively more bits compared to other frame types. I frame coding complexity is relatively less than other frames types.
P or Inter frame is a predictive video frame. This coding is done according to predictions made on the current frame following the previous I or P frames. A P frame is coded by using temporal redundancies from the previous frame. P frame uses relatively less bits than I frame and its complexity is higher. B frame refers to a Bi-directionally predicted frame and requires information from previous and following I, P or B frames. B frame uses relatively less bits than all other frame types and its coding complexity is greater than all other frames types. Usage of this type of frame introduces system delay. Hence, it is not popular in real time low delay applications.
Picture Aspect Ratio: Representation of an image width to its height. A general notation is the form of X:Y where X represent the image width and Y represent the image height. While there are several video standards that are currently used in video applications, it is necessary to keep aspect ratio stable when converting from one display standard to another. Avoiding that may cause the resultant frame to look distorted, squeezed or stretched.
Packet Loss: Packets are units of information sent across a packet switched network from their source address to a destination. Packet loss occurs when one or more packets fail to reach their destination. On network protocols such as UDP that provide no recovery mechanism for packet loss, applications should handle that error efficiently and should be able to conceal the lost data. In video conferencing applications, packet loss is the most frequently encountered error type and it reduces video quality and quality of service.
H.26x, MPEG-x, WMV x, Real Video, VPx: Commonly used video standards and video codecs. Some of those widely used video codecs are specified in international standards while the others are based on proprietary standards.
- The H.26x term refers to ITU standards while the MPEG-x term refers to ISO/IEC standards.
- WMV (its latest version known as VC1) is Microsoft’s standard for high efficiency video coding. RealVideo is a popular video codec, developed by RealNetworks mainly used in PC and mobile applications.
- VPx is a proprietary video codec, developed by On2 Technologies and is commonly used by Adobe flash player and internet video platforms.
- Common MPEG codecs are MPEG 2 and MPEG 4. MPEG 2 is widely spread as a popular storage and broadcasting codec. MPEG 4 and its derivatives are common in mobile device applications as well as storage formats, and supported by many DVD players.
- Common H.26x codecs are H.263 and H.264. H.263 is widely used by video conferencing applications. H.264 is a joint development of ITU and ISO/IEC and currently is the latest video standard available in the industry. H.264 goal was to provide good video quality at substantially lower bit rates than previous standards without increasing the complexity of design.
Lossy compression: A term used to describe a compression method where the compressed data cannot be reconstructed exactly as the original form. This type of compression is mainly used in visual and audio applications where a partial loss of data is acceptable by the human visual and hearing systems. As opposed to a lossless compression where the compressed data can be reconstructed precisely, lossy compression methods require significantly less bits in the compressed form.
Jitter: A term used to describe the variation in packet delay. In packet switch applications, where data is carrying over network packets, there is a variance in the packet arrival timing. In order to overcome this variance and to provide a smooth usage of the received packets, a delay buffer is added to the system. In most cases, the buffer size is being determined by the max introduced variance.
Lip Sync: A term used to refer to the relative timing of audio and video portions during playback. In general, it used to describe the matching of lip movements with voice. Generally, human perception is sensitive to non synchronized audio and video with a relative phase of ~100 msec. In contrast, there are some people that are sensitive to much lower durations.
Video Artifacts: The big challenge in most video applications is to provide the highest video quality with a minimum cost of bit rate. As a result of lossy compression techniques, non optimized network conditions and other application restrictions, video quality is affected and quality of service may reduce. Video artifacts may be generated from non optimal settings and environment characteristics, causing an unpleasant visual view. The most popular video artifact is the quantization noise, generated as a result of bit rate reduction. Network packet loss, when accruing frequently, increases the video artifacts dramatically. Other artifacts like ringing noise, blocking effects, blurred images, un-sharpness and more are a result of the codec processing and in some cases, may be compensated with post processes after the decoder task.
I guess that some of the readers will jump with a statement – I didn’t know that but I wouldn’t consider myself a dummy. On the other hand, others will claim that there are many other terms that were not introduced. All are right and these terms are just the beginning.
To be continued.