Anatoli already did a great job of covering the HD Communication Summit – a very successful gathering of people from our industry, trying to (re-)push towards the use of wideband voice codecs for VoIP services, so that we will once and for all be able to hear each other properly.
While most experts will explain that HD voice is more than just a codec (for all the good reasons), the main obstacle that I see here is actually THE codec. Or rather the lack of THE codec.
Take a look at the following list of wideband codecs I’ve gathered here from the top of my head:
- G.722
- G.722.1
- G.722.1 Annex C
- G.722.2
- Siren 14
- Siren 22
- AAC-LD
- AAC-LC
- Skype’s SILK
And I’m sure there are more – I am no voice codec expert myself.
These codecs vary in a lot of different ways, such as sampling rate, bit rate, computational needs, etc. And still – they are all wideband codecs, suitable for “HD Voice”.
Take a look at HD Video – there is only one de-facto codec: H.264. You might use the AVC flavor of it, or the new SVC flavor; but H.264 is THE codec.
Until we come to terms about THE codec for HD voice, we will continue to just talk about making it a commonplace.

Comments and trackbacks
1. Alexander Chemeris | June 5th, 2009 at 4:37 pm
You’re missing Speex WB/UWB and AMR-WB in the list – the most popular in open-source world and in mobile world
And I should point out, that AAC in not actually a voice codec at all. So while AAC-LD is suitable for VoIP AAC-LC has too much algorithmic latency in it.
Also I should note, that H.264 actually has nothing to do with HD Video when you talk about non-streaming HD Video. There enough examples of usage of MPEG-2 and MPEG-4/ASP for HD Video.
H.264 came have become the leader “just” because it provides much higher quality with the same bitrate then other codecs do. And because it has been created by MPEG – the standard group for video. This is not the case with voice – you can find a couple of codecs which has comparable bitrate and quality, there is no widely acknowledged VoIP codecs standardization body, and because for voice coding bandwidth is not that critical as for video.
There are more point here, but these ones are the most important, imho.
2. Tsahi Levent-Levi | June 5th, 2009 at 6:40 pm
Alexander,
Thanks for adding some more wideband codecs and especially for explaining why there are so many of them…
You are of course correct about video, but I am specifically talking about video conferencing – in that regard, H.264 is by far the most prevalent HD video codec.
Tsahi
3. Michael Leuker | June 7th, 2009 at 5:01 pm
You are right, Tsahi: HD Voice definitely is about the codec before anything else and this point is one of the major factors that crippled the PSTN networks from ever moving on; combined with the general unwillingness of the Telcos to provide their customers with a better telephony experience. Just take a look at how many of the (state owned) companies milked their networks and one appreciates the discussion that Jeff started all the more. But I digress…
It’s great to see SPEEX mentioned as if there is one codec that actually has a chance of becoming a standard it xiph’s brainchild. It offers all the technologies required of a modern codec (packet loss concealment, VAD, DTX) and much more. It is one of the most scalable codecs both when it comes to sampling, bitrate and processing power involved and the only codec that does VBR. And even though the last point can be debated as VBR has certain inherent security risks (different sounds compress in a very specific manner) there is one point that there will be no argument about:
Licensing fees.
The SPEEX sourcecode is free to anyone who wishes to use it under a very generous license that allows to use it freely even in commercial projects. All other codecs mentioned are either patent encumbered or clearly represent particular interests where one company can decide whether or not a competitor is going to have their codec… even if it is “free” otherwise.
Simple G.722 is an exception because its patents have expired. Unfortunately, in spite of all the laurels it deserves for being one of the few more serious contenders for PSTN HDA the codec is not efficient at all compared to the others and I doubt that it will see wide spread adoption in any IP network.
It is beyond me why SPEEX hasn’t seen wider adoption, why any Asterisk server has to be enabled to use it with a patch, why supporting hardware is non-existent (or?) and why the codec more or less got ignored at the HD Communication summit. Not to see a conspiracy where there is none, but it’s as if everybody is trying hard to not see the obvious solution right before their eyes. But even if in the end it is not SPEEX the whole matter of codec choice deserves much more time and I am really glad that we have finally started discussing it.
4. Alexander Chemeris | June 7th, 2009 at 8:01 pm
> It is beyond me why SPEEX hasn’t seen wider adoption
> why supporting hardware is non-existent (or?) and why
> the codec more or less got ignored at the HD Communication summit.
Couple of reasons IMHO:
1) It’s based on CELP, not ACELP (which is patented by VoiceAge), so it performs slightly worse then commercially available codecs. E.g. refer to this e-mail from Jean-Marc Valin: http://lists.xiph.org/pipermail/speex-dev/2005-August/003603.html
2) It’s not included in most well-known commercially available media engines. For big companies it’s easier to pay for a whole media engine, for its commercial support, etc.
Every big media engine provider (like GIPS, Spirit, VoiceAge, etc) has its own home-brew WB codec which they try to push.
We (sipX developers) don’t push any specific codec, so we offer Speex as well as others.
3) No one made it a standard, e.g. like AMR(NB/WB) for mobile phones, and big companies don’t tend to use it. See (2).
4) No one is yet sure that it *really* does not infringe any patents. No patents search has been done yet, at least no one has been publicly announced.
Big shift here is Adobe with its Flash 10, which include Speex as preferred WB codec. I think it will move it more towards de-facto standard.
> why any Asterisk server has to be enabled to use it with a patch,
I can’t tell for sure, but IIRC Asterisk still use 8K clock rate under the hood, so adding WB- (well, HD-) codec to it is a bit of hack. Though things may have changed since I looked at it last time?
–
Regards,
Alexander Chemeris.
SIPez LLC.
SIP VoIP, IM and Presence Consulting
http://www.SIPez.com
tel: +1 (617) 273-4000
5. Michael Leuker | July 8th, 2009 at 2:42 am
Sorry for getting back so late to your very informative answer, but I didn’t want to omit to thank you especially for the link to Jean-Marc’s message. I have been looking for a statement about SPEEX quality relative to G.729 for quite a while and could never find anything before.
6. Neha | November 26th, 2009 at 11:51 am
I am looking for codec Siren1424/8000.
Can anyone give some info of this and is its source code freely available ?
Thanks
7. Tsahi Levent-Levi | November 26th, 2009 at 3:24 pm
Neha,
Siren works at 7, 14 and 22 KHz.
These modes require a respective sampling rate of 14, 28 and 44 KHz (or subsampling of a higher sampling rate).
Siren7 was standardized as G.722.1.
Siren14 became G.722.1 Annex C.
Siren1424 probably refers to a 14KHz bandwidth audio signal that is compressed to 24kbps using the Siren codec.
I don’t know what the 8000 refers to. It doesn’t make sense for it to be the packet size since it would be a packet size equivalent to more than 333msec.
Polycom licensees can receive reference source code in C.
A lot of information can be found at:
http://www.polycom.com/company/about_us/technology/siren14_g7221c/index.html
8. O. | December 10th, 2009 at 4:45 pm
Hi,
could you suggest a free softphone that allow to speech with AAC codec?
BR
9. Tsahi Levent-Levi | December 11th, 2009 at 7:02 pm
O,
Sorry, but I don’t use free softphones enough to be able to suggest one with a goof AAC codec.
Maybe other readers here would assist you.
Tsahi
Trackback this post | Subscribe to the comments via RSS Feed