HD Voice is something I’ve been writing about for more than 2 years now. More accurately I’ve been asking a very important question: which HD voice codec is the one to use.
I’d like to revisit this a bit, due to some changes in the market, and explain how I see things moving forward.
Out of the many voice codecs available to us today, here are the ones that are the most interesting:
- G.722, a voice codec standardized by the ITU-T, providing an analog sound up to 7 kHz. It is a commonly used wideband codec for VoIP protocols such as H.323 and SIP.
- AAC-LD is a low delay audio codec that is defined in MPEG-4. It has the largest analog sound spectrum from all the codecs mentioned here. It is used by some of the video conferencing companies, but not all of them – some have opted not to use AAC at all while others have decided to go for AAC-LC (low complexity) instead. This makes the selection of an AAC-xx codec an issue if interoperability is what you have in mind.
- SILK is the codec used by Skype. It was unveiled by Skype a few years ago, with a promise to make it royalty free and widely available – most probably to make gatewaying from Skype to standards-based VoIP solutions easier. Fast forward to today, SILK is still mostly used by Skype.
- iSAC is GIPS own proprietary voice codec. After Google acquired GIPS and have open sourced it along with other GIPS technologies, it is freely available to others to use. While it hasn’t made significant inroads with VoIP systems, it is used by some of the messaging applications out there. Only time will tell if this will become prevalent in VoIP systems as well.
- AMR-WB, also known as G.722.2 is the voice codec of choice for wideband over cellular networks – that is, if you are developing something that has been specified and standardized for service providers – VoLTE or 3G-324M. What makes it interesting is that most mobile chipsets probably have its implementation etched into their hardware and not run as software on top – the tricky part is gaining access to it from 3rd party applications.
To make it simple, here’s a table of these codecs:
|G.722||7 kHz||Best suited for H.323 and SIP, if you are looking for interoperability|
|AAC-LD||8-96 kHz||Used for most high-end H.323 video conferencing systems|
|SILK||8-24 kHz||Skype. Most of the rest of the pack skipped this one|
|iSAC||16 or 32 kHz||GIPS in origin. Got open sourced by Google recently|
|AMR-WB||16 kHz||The wideband voice codec of choice for VoLTE|
Which one should you be using? That would depend on your setting and scenario. My priorities would be to start at G.722 for interoperability and move from there to the other options.