Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Codec2 — an Open Source, Low-Bandwidth Voice Codec

Soulskill posted more than 3 years ago | from the sound-of-efficiency dept.

Communications 179

Bruce Perens writes "Codec2 is an Open Source digital voice codec for low-bandwidth applications, in its first Alpha release. Currently it can encode 3.75 seconds of clear speech in 1050 bytes, and there are opportunities to code in additional compression that will further reduce its bandwidth. The main developer is David Rowe, who also worked on Speex. Originally designed for Amateur Radio, both via sound-card software modems on HF radio and as an alternative to the proprietary voice codec presently used in D-STAR, the codec is probably also useful for telephony at a fraction of current bandwidths. The algorithm is based on papers from the 1980s, and is intended to be unencumbered by valid unexpired patent claims. The license is LGPL2. The project is seeking developers for testing in applications, algorithmic improvement, conversion to fixed-point, and coding to be more suitable for embedded systems."

cancel ×

179 comments

Presentation this week. (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33645996)

I'll be presenting on Codec2 at the ARRL/TAPR Digital Communications Conference this weekend in Vancouver Washington, Near Portland. I'll try to get the video online.

Re:Presentation this week. (1)

shriphani (1174497) | more than 3 years ago | (#33646010)

Please do. This looks very nice.

Re:Presentation this week. (2, Funny)

Anonymous Coward | more than 3 years ago | (#33646060)

But will you be presenting IN Codec2?
That would be very impressive.

Re:Presentation this week. (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646096)

I am bringing the materials for a demo table with two laptops and real-time encode-decode, so people can try it themselves.

Re:Presentation this week. (-1, Flamebait)

erwin102 (1905582) | more than 3 years ago | (#33646322)

Stretching is an important part of any workout routine. It helps increase your flexibility and reduce your chances of injury. ========== stretching exercises [trainwithmeonline.com]

Original Rationale (5, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646022)

The original rationale for Codec2 is at Codec2.org [codec2.org] . I've been promoting this issue for about four years, as I was bothered by the proprietary nature of the AMBE codec in D-STAR. But I didn't have the math, etc., to do the work myself. It was really fortunate that David became motivated to do the work without charge. He has a Ph.D. in voice coding. By the way, look over his web site rowetel.com for the other work he's done: two really nice Open Hardware projects - a PBX and a mesh telephony device, an Open Source echo canceler for digital telephony, used in Asterisk and elsewhere, and his own electric car conversion. He'd be my nomination for the MacArthur grant.

Re:Original Rationale (4, Informative)

Yaur (1069446) | more than 3 years ago | (#33646224)

In a nutshell it looks like the rational for not just using Speex is:
  • better resilience to bit errors
  • better performance at ultra low bitrates

Re:Original Rationale (4, Informative)

Bananatree3 (872975) | more than 3 years ago | (#33646240)

that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. Radio, by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.

Re:Original Rationale (2, Informative)

adolf (21054) | more than 3 years ago | (#33646442)

(Stating the obvious for those with sufficiently low UIDs and/or those who remember VAXen, or similar, or at least those with a proper beard...)

that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. Radio, by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.

that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. UDP [wikipedia.org] , by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.

(There. Extrapolated that for you. Doubly-so [wikipedia.org] , perhaps.)

Re:Original Rationale (0)

Anonymous Coward | more than 3 years ago | (#33646596)

There is a difference between a high bit error rate and occationally lost packets. Do Codec2 handle both cases well or did your extrapolation introduce incorrect information?

Re:Original Rationale (0)

Anonymous Coward | more than 3 years ago | (#33646598)

There's a fairly significant difference between 10% bit errors and 10% packet loss. Unless Codec2 is designed to split each frame of audio across multiple packets it's not a useful extrapolation. (I've written a basic VoIP app using Speex over UDP and, while not Skype, it was usable with 10% packet loss).

Re:Original Rationale (4, Informative)

Yaur (1069446) | more than 3 years ago | (#33646610)

With UDP the typical loss scenario is dropped packets but with radio single bit errors are more likely. This difference means that FEC strategies for one scenario are not directly applicable to the other.

for UDP in packet FEC data is useless and your error correction scheme needs to be prepared to deal with losing a whole packets worth of data to be useful. For voice this is going to introduce too much latency so instead a typical codec might just try to interpolate the lost data. With radio on the other hand there is value to in packet error correction bits within the stream and in the event of an error you are going to have more data with which to guess what the audio should be like, especially if you know which bits are errored (or possibly errored)

Re:Original Rationale (1)

tepples (727027) | more than 3 years ago | (#33647968)

With radio on the other hand there is value to in packet error correction bits within the stream and in the event of an error you are going to have more data with which to guess what the audio should be like, especially if you know which bits are errored (or possibly errored)

But wouldn't the underlying link just automatically FEC the packets at a lower layer, even if only to get the packet drop rate down?

Re:Original Rationale (5, Informative)

jmv (93421) | more than 3 years ago | (#33647094)

The fundamental difference is not that much the lossless vs lossy transmission, but the actual bit-rate. I designed Speex with a "sweet spot" around 16 kb/s, whereas David designed codec for a sweet spot around 2.4 kb/s. Speex does have a 2.4 kb/s mode, but the quality isn't even close to what David was able to achieve with codec2.

Re:Original Rationale (3, Informative)

wowbagger (69688) | more than 3 years ago | (#33647476)

If you've ever heard AMBE in the presence of bit errors, it doesn't do so well either. It isn't the vocoder's job to deal with bit errors, it is the protocol's job. Over half the bits in a APCO-25 voice frame are forward error correction for the voice payload: Golay encoding, Reed-Solomon, bit order scrambling (interleaving), you name it.

Putting resistance to bit errors in the codec is the wrong place to do it.

Now, making the codec use less bits, so the protocol layer has more bits for FEC makes sense.

Re:Original Rationale (0)

Anonymous Coward | more than 3 years ago | (#33646230)

How does it compare to CELT? Or does CELT have similar problems to Speex in the over the air use case Codec2 is designed for?

Re:Original Rationale (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646482)

How does it compare to CELT?

So far, we've really only compared it to g.729, and it does OK against that. CELT starts at 32 kilobits per second and we're at 2 kilobits, so it's not really for the same application. But I noticed that the Alpha, all-floating-point implementation with some known low-performance code encoded the 3.75 seconds in 0.06 seconds, and decoded them in 0.04, on my 2.4 GHz processor. I would think that a polished implementation could achieve low delay on a DSP chip or some flavors of embedded CPU.

Re:Original Rationale (2, Interesting)

slimjim8094 (941042) | more than 3 years ago | (#33646346)

Looks really cool. I haven't messed around with D-STAR since I don't like the idea of being tied into a specific system (seems to contravene the point of amateur radio). I'll definitely be keeping an eye on this to see where it heads.

I had a really awesome idea just now for transmitting this at 1200bps using AFSK Bell 202 (like APRS) and hacking up live voice using entirely existing equipment (TNCs, etc). But the given example of 1050 bytes/3.75s works out by my math to 2240bps. I guess you could run it over 9600bps packet, with room to spare (text chat?)

73,
KC2YWE

Re:Original Rationale (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646370)

Sound-card modem implementations over SSB would be practical. See FDMDV [n1su.com] . We're still a little wide for that, but we'll get there.

Re:Original Rationale (3, Informative)

Gordonjcp (186804) | more than 3 years ago | (#33646524)

You could just about squeeze it into 2400bps. It would probably be possible to get that out of existing AFSK modems without needing to go down the route of discriminator taps and such. Using a hardware GMSK modem like the FX589 chip would give you 9600 baud with the option of interoperating with existing D-Star modems, and interfacing an FX589 is going to be easier to implement than a G3RUH modem.

Re:Original Rationale (2, Interesting)

the way (22503) | more than 3 years ago | (#33646600)

By the way, look over his web site rowetel.com for the other work he's done: two really nice Open Hardware projects - a PBX and a mesh telephony device, an Open Source echo canceler for digital telephony, used in Asterisk and elsewhere, and his own electric car conversion.

I've got one of his little ip01 telephony boxes, and it is quite fantastic - a tiny, cheap, fanless, (embedded) Linux computer with plenty of memory and CPU grunt, and of course telephony hardware on board. It also has a package manager, with a quite a few pieces of software available, and regular firmware updates. It's much more powerful than the various Linux-based consumer routers that are available - it's a great option if you're looking for a small Linux server to run Asterisk, a little web site, DNS server, SSH, etc...

(I'm not affiliate with David or Rowetel in any way - just a happy customer, who is in awe of the amazing things this guy has achieved in such a wide variety of areas).

Re:Original Rationale (1)

Bruce Perens (3872) | more than 3 years ago | (#33646634)

I have an ip04 on my business phone number, which is a SIP DID.

Re:Original Rationale (0, Flamebait)

Anonymous Coward | more than 3 years ago | (#33647030)

How about you start licensing it with non nazi-like licences. I would use this if it wasnt LGPL licensed (I use a non standard language so I would need to modify it to add bindings to my languge). 2 clause BSD license would be good.

But then, I suppose some people LIKE nazi-like licences. oh well.

Re:Original Rationale (1)

TheRaven64 (641858) | more than 3 years ago | (#33647298)

Meh. I'm at least as critical of the GPL as the next guy, but it's hard to hate the LGPL2 (presumably he means v2.1). LGPLv3 has some significant issues, the most amusing one being that it is incompatible with GPLv2. The LGPL is non-viral, so you can keep the rest of your code under a more permissive license if you want to, but it has enough toothless legalese to keep the GPL crowd mostly happy.

Re:Original Rationale (1)

fuzzyfuzzyfungus (1223518) | more than 3 years ago | (#33647862)

In addition to the LGPL 2 being substantially less GPL-like than its name suggests(as well as being a fairly logical choice if you want to extend maximal freedom to the user, while making the creation of incompatible forks that have to be reverse-engineered less likely), arguing over the license of the reference implementation of a specifically-designed-to-be-patent-unencumbered codec spec seems especially pointless.

Being able to use the reference implementation certainly is convenient and timesaving; but it is the codec itself that is the really important bit. Anybody is free to write a conformant implementation under their license of choice, or attempt to buy the right to use the code under some other license from its creator. Without patents, nobody can stop you from doing that, and there is nothing in the LGPL2 that prevents using the LGPL2 code as a reference when writing a new implementation, so long as you aren't just copying it.

Re:Original Rationale (2, Insightful)

fuzzyfuzzyfungus (1223518) | more than 3 years ago | (#33647784)

Wow. Ordinarily I'm of the opinion that crying "Godwin's Law!" is a bit overused; but having someone describe the LGPL as "Nazi-like" is making me reconsider.

Somebody goes to the trouble of designing a novel, patent unencumbered(ie. if you don't like the software licence, you are perfectly free to write your own implementation), codec that fits an otherwise rather underserved niche. They have the temerity to release it under a license requiring you to release your modifications to their code if you distribute in binary form and this is somehow analogous to a particularly virulent flavor of genocidal fascism?

You are really messing up the BSD crowd's reputation for being ideologically mellow compared to team GPL...

Re:Original Rationale (1)

koiransuklaa (1502579) | more than 3 years ago | (#33647976)

How about you start licensing it with non nazi-like licences. ... But then, I suppose some people LIKE nazi-like licences.

Software license can often be negotiated: Authors may be willing to relicense (or add another one), if given a well presented and compelling argument for the change. This has happened before -- picking the right license can be difficult and it's possible the original authors did not think of all scenarios.

Speaking of "a well presented and compelling argument": you, sir, did not make one.

Re:Original Rationale (1)

spickus (513249) | more than 3 years ago | (#33648124)

This is terrific. I hope to see 'FreeStar' repeaters soon.
 

Interactive communication? (1)

ard (115977) | more than 3 years ago | (#33646034)

What is the compression ratio for more interactive communication, e.g. 20 ms sampling time instead of 3-4 seconds?

Re:Interactive communication? (3, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646070)

It is a real-time codec on my workstation and is intended to be a real-time codec on embedded DSP. It's currently all floating point and does things it should not like malloc of multiple buffers per sample

Download the code and build it. It's "just type make" on Linux. The raw (uncompressed) sample format we've used for testing is 16-bit samples at 8 KHz and there are some tools to play those, and some pre-recorded samples. Not too much trouble to figure out.

Err Speex (2, Informative)

Knee Socks (1600375) | more than 3 years ago | (#33646050)

Speex: Speex is based on CELP and is designed to compress voice at bitrates ranging from 2 to 44 kbps. Some of Speex's features include: Narrowband (8 kHz), wideband (16 kHz), and ultra-wideband (32 kHz) compression in the same bitstream

Speex developers are involved (3, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646134)

Jean-Marc Valin is on the project mailing list and David is another Speex developer and the person Jean-Marc recommended to me. We are trying for an improvement over Speex at low rates.

Re:Err Speex (3, Informative)

Gordonjcp (186804) | more than 3 years ago | (#33646162)

Speex isn't great in this application, because at low bitrates there is a significant delay through the codec and the output stream requires far too much bandwidth to be useful. Consider that digital speech systems like Mototrbo, TETRA, P25 and Iridium typically have less than 6kbps throughput once you've taken FEC into account.

what about LATENCY? (4, Interesting)

Kristopeit,MichaelDa (1905518) | more than 3 years ago | (#33646064)

why is seemingly the most important aspect of communication technology so often overlooked?

i assume it's acceptable... but it angers me that someone thought it was relevant to give the exact number of bytes for a seemingly arbitrary 3.5 seconds of audio, but failed to say how long it take to encode that 3.5 seconds of audio, or what average latency can be expected after buffer conditions are met.

Re:what about LATENCY? (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646090)

Right. Sorry. Real time on the x86 workstation I'm using. Not converted to fixed-point for weaker CPUs yet. Not tested on ARM, Blackfin, AVR, etc. Waiting for you to do that :-) Downloadable code. Reasonably portable. Type make and let fly.

Re:what about LATENCY? (2, Informative)

KliX (164895) | more than 3 years ago | (#33646372)

I think he probably means it in a 'how many samples does the codec need before it can send a packet' type of latency.

Re:what about LATENCY? (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646420)

There are currently 51 bits in a frame. That is the minimum that you can send, and you'd send 40 of those per second as the codec is presently implemented. A real data radio would add bandwidth for its data encapsulation, but would have to meet the time and bandwidth requirements of the codec payload.

Re:what about LATENCY? (1)

sahonen (680948) | more than 3 years ago | (#33646708)

What we're trying to ask is if you pipe a real time stream of samples from a microphone into one end, encapsulate the data in UDP packets, bounce the stream off 127.0.0.1, unencapsulate them, pipe it into a decoder and from there into a sound card and speaker... How much time is there between me saying "hi" into the mic and hearing "hi" out of the speaker? This is by far the most important consideration for modern voice protocols. Low bandwidth is nice. Low CPU is nice. Error tolerance is nice. Latency is crucial. If you don't think it's crucial, get 30 people in a Ventrilo channel and listen to them step all over each other.

The developers of Mumble have gotten very good at reducing latency, and would be worthwhile to bounce ideas with.

Re:what about LATENCY? (1)

Dan Dankleton (1898312) | more than 3 years ago | (#33646914)

Latency is crucial for the application you are talking about. Low bandwidth and error tolerance is more important for a codec which will primarily be used for simplex radio applications which Codec2 is designed for (ignoring for the moment the D-Star reflectors.) Different applications, different requirements. This is why there are lots of codecs ;) Dan MD1CLV

Re:what about LATENCY? (1)

sahonen (680948) | more than 3 years ago | (#33646944)

If your transmission medium is half duplex and shared by several users, you run into exactly the phenomenon I described. A codec with 250ms of latency creates a 250ms window in which two people can start talking without realizing they're stepping on each other. People on aviation frequencies step on each other all the time and the only latency there is speed of light radio propagation.

Re:what about LATENCY? (1)

Dan Dankleton (1898312) | more than 3 years ago | (#33647000)

It happens in analogue ham radio too. In FM, thanks to the capture effect, it means that one of the two gets heard and there is a 'protocol' between speakers to deal with that. I don't know what the effect is of multiple signals on the modulation used in D-Star though.

Re:what about LATENCY? (1)

Bruce Perens (3872) | more than 3 years ago | (#33646988)

There is no reason that you can't send a packet for each frame. There isn't any important state, so far, that persists between frames. That's 7 bytes (really 51 bits) 40 times per second. CPU speed doesn't seem to be a problem for latency from what we have seen so far.

Re:what about LATENCY? (1)

sahonen (680948) | more than 3 years ago | (#33647156)

So basically, 25ms of encoding latency, plus the latency of your audio hardware input and output buffers, plus network/medium propagation (5-10ms for satellites?), plus any network jitter buffering. That's pretty good. CELT claims 3-9ms but I'd like to hear a comparison of audio quality at 24 kbps, especially considering the differences between their designs.

Re:what about LATENCY? (1)

Kristopeit,MichaelDa (1905518) | more than 3 years ago | (#33647570)

he later said it took .06 sec to encode and .04 sec to decode the 3.75 sec sample.

do those numbers mesh?

Re:what about LATENCY? (1)

Kristopeit,MichaelDa (1905518) | more than 3 years ago | (#33647558)

i don't mean to be rude, but how about you just make an audio recording of you live streaming from one machine to another? a video maybe? do you have a digital camera that can take videos?

1 picture... 1000 words, and such. i could have made a video of this comment and uploaded it to youtube faster than i could type and post it.

i understand latency might not be an issue for the intended application, but developers choosing which codec is best for their own applications will certainly require initial response delay and continued latency numbers to make informed decisions.

Re:what about LATENCY? (0)

Anonymous Coward | more than 3 years ago | (#33646704)

Any reason not to try to get it included in the FFmpeg project?

Re:what about LATENCY? (2, Insightful)

Garridan (597129) | more than 3 years ago | (#33646106)

Well, the source is right there on the webpage. Why don't you download & compile it, and see for yourself? It's an alpha release so I'll guess that it's slower than it could be.

Re:what about LATENCY? (2, Interesting)

Kristopeit,MichaelDa (1905518) | more than 3 years ago | (#33646206)

it could take 16MB/s and still function in real time over the internet for me... my problem isn't that the latency wasn't shown, it was that the bitrate WAS shown BUT the latency wasn't shown.

also, considering the advantages of using lower bitrate voice codecs, the ability to implement the encoder and decoder algorithms directly in very low transistor count custom hardware would appeal to the same crowd... so not just latency in terms of x86 instructions per second, but the ability to implement those instructions in hardware.

i am concerned about bruce's use of the term "real time"... either he is implying there is no noticeable latency to him, (which is irrelevant to me as numerous others claim skype video chat is "real time", and also impossible given the implicit time consuming process of encoding), or he's cleverly stating that the time it takes to encode is the real time it takes to encode. it's not the fake time. it's real time.

again, i assume, and it seems i'm correct to do so, that the codec is "very usable"... i won't be trying it as i have no need for it.

Re:what about LATENCY? (1)

Bananatree3 (872975) | more than 3 years ago | (#33646270)

The final destination for Codec2 *isn't* X86 processors, but DSP chips. If, for some reason latency is an issue when it's first shoehorned into a DSP chip, Codec2 will be refined until it works well on a DSP chip, in real real time.

Re:what about LATENCY? (2, Interesting)

Kristopeit,MichaelDa (1905518) | more than 3 years ago | (#33646302)

yes, of course... but "refining" a codec for hardware implementation is doing the exact opposite to the quality of the signal.

why not refine the a DSP chip architecture until it works well with the original codec? i know masks are expensive... but why not do it all the way?

Really early latency figures (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646352)

It encoded those 3.75 seconds in 0.06 seconds and decoded in 0.04 seconds on my AMD Phenom 9750 2.4 GHz, one core only, compiled with GCC and the -O3 switch. That's all of the overhead of the program starting and exiting, too. It's using floating, not fixed point.

This, it seems, bodes well for low latency of the final implementation on a DSP chip.

Re:Really early latency figures (0)

Anonymous Coward | more than 3 years ago | (#33646636)

That isn't latency! If it requires 3.75 seconds of audio before a compressed stream is emmitted it is largely useless for comms. Fortunately, from a super-post of this one it appears that latency is pretty low - 1 frame is the minimum that can be sent (40 frames per second, 51 bits per frame).

Re:Really early latency figures (1)

Kristopeit,MichaelDa (1905518) | more than 3 years ago | (#33646740)

yeah, this is what i'm trying to figure out... sally says "hi"... how long until bob hears her.

.1 seconds does bode well for an eventual lower level implementation. 3.85 seconds and you might as well trash it, but i'm almost certain that isn't the case as the phrase "real-time" was thrown out a few times.

Re:Really early latency figures (3, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33647012)

No significant state between frames so far. 25 miliseconds per frame. That is the minimum delay before the other side starts to hear the audio.

Re:Really early latency figures (1)

Rogerborg (306625) | more than 3 years ago | (#33647098)

Yup, Core 2 Duo P8700 @ 2.53Ghz, compiled with O3:

time ./c2enc ../raw/hts1a.raw hts1a_c2.bit

real 0m0.062s
user 0m0.060s
sys 0m0.000s

time ./c2dec hts1a_c2.bit hts1a_c2.raw

real 0m0.048s
user 0m0.044s
sys 0m0.004s

Thanks for promoting this, it's a fascinating project.

Re:Really early latency figures (1)

TheRaven64 (641858) | more than 3 years ago | (#33647356)

Bruce, you've replied to this question several time, but you are not understanding the question. Almost every encoder buffers some data then compresses it. Generally, the larger the buffer, the better the compression, but the greater the delay between starting to put audio into the encoder and starting to get audio out. The same thing happens at the decoder end. The question is how much (in terms of milliseconds of audio) does the encoder need to buffer before it starts compressing and how much does the decoder need to buffer before decompressing? Add these two numbers together, and you get the latency number that the original poster wants.

The numbers in TFS are 280 bits per second. If you have 51-byte frames, this is 5.5 frames per second. Assuming that frames are independent, this gives 180ms of encoding latency, presumably the same amount of decoding latency, which seems incredibly high. Presumably this goes down if you increase the bit rate a bit, so at 2KB/s it would be about 50ms (assuming that the codec is, as you imply, stateless between frames). That's a lot more reasonable.

Re:what about LATENCY? (0)

Anonymous Coward | more than 3 years ago | (#33646666)

I don't see the .msi? Are there VB sources for this?

Re:what about LATENCY? (1)

jmv (93421) | more than 3 years ago | (#33647134)

Don't worry. The frame size is 20 ms and there's probably (haven't looked at that detail) around 10 ms of look-ahead, so latency shouldn't be an issue. I'd actually argue that it could be increased *if* there's a way to reduce the bit-rate by doing that.

Re:what about LATENCY? (0, Flamebait)

Kristopeit,MichaelDa (1905518) | more than 3 years ago | (#33647160)

i was never worried. you're an idiot.

Re:what about LATENCY? (1)

Kristopeit,MichaelDa (1905518) | more than 3 years ago | (#33647228)

i agree latency SHOULD NOT be an issue. my issue was determining IF latency IS an issue.

bruce has stated a .1 second total codec processing time on the 3.75 sec audio sample. i don't know what that means for response times, or how they change with longer or shorter or streaming audio samples. what happens if a stream is interrupted? how many frames are lost? is there a noticeable audible byproduct of lost or damaged data?

Serindipidy. (3, Interesting)

firstnevyn (97192) | more than 3 years ago | (#33646122)

As a newly licenced ham in a area where Dstar repeaters are everywhere (VK) and free software advocate I have recently become aware of the issues with Dstar and have been reading about this work so it's quite surreal to have it pop up on /. in the week where I get my licence. I havn't had a chance to read the Dstar specifications but am wondering if the voice codec is flagged in the dstar digital stream. and if it would be possible to create translating repeaters so dual output repeaters with differently coded data streams it'd take more spectrum but would also allow for a migration path (at least for repeater users?)

Re:Serindipidy. (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646174)

Congratulations on the license, OM. We haven't yet explored how to wedge this into D-STAR, but sending it as data rather than voice would be one way. All of the D-STAR radios except the latest one, the IC-92AD, use a plug-in daughter board to hold the AMBE chip, and it might be that somebody could make a dual-chip version of this board sometime. Since AMBE is proprietary we are stuck using their chip if we want to be compatible, unless the repeater does the conversion for us using a DV-Dongle. They sell TI DSP chips with their program burned in, and don't give out the algorithm.

It may be that on D-STAR the AMBE chip also does the modulation for a data transmission, just doesn't run the codec. But the modulation is known and there is a sound-card software implementation of D-STAR that interoperates with it. I don't have any D-STAR equipment to test. The folks on dstar_development@yahoogroups.com know a lot more about D-STAR.

73
K6BP

Re:Serindipidy. (1)

MichaelSmith (789609) | more than 3 years ago | (#33646190)

Why does a repeater need to understand the encoding? Can't it just rebroadcast the data, or even the analogue signal?

Re:Serindipidy. (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646294)

The repeater can rebroadcast the data, but that data would be AMBE encoded, and AMBE is both trade-secret in its implementation and patented in some of its algorithms. There may be an AMBE chip in the repeater, I've not played with one. The usual way one converts to and from AMBE on a PC is with a device called the DV-Dongle, which contains the AMBE chip. This costs lots of money and is not nearly so powerful as the CPU of the computer it's plugged into, which is one reason to be fed up with proprietary codecs.

So, if you had some newer, Codec2-based radios, and some older D-STAR radios, linking repeaters might be a good way to get them to talk to each other.

This is hand-waving about a lot of issues, like we've not designed the next generation of data radio to put Codec2 into. One might guess that such a thing could use IPV6, and better modulation than just FM, and FEC, etc.

Re:Serindipidy. (1)

Dan Dankleton (1898312) | more than 3 years ago | (#33646958)

I had some thoughts about this when I first looked into D-Star.

There are some reserved bits in the data section of the protocol (even when voice is being transmitted) which are defined as 0 in current implementations. It would be relatively easy for repeaters to be upgraded to understand that a 1 in one of those indicates a different codec, and use some more reserved bits to indicate which codec.

Repeaters (separate units) could be used to transcode if the end users are using different codecs - this would involve a software change on the repeaters but would not require that people who've already got D-Star radios upgrade.

The basic D-Star design seems quite good to me, but I can't understand why some kind of futureproofing wasn't designed in from the start since the quality of codecs is improving all the time!

Dan MD1CLV

Re:Serindipidy. (4, Insightful)

NF6X (725054) | more than 3 years ago | (#33646280)

Congratulations on your new license!

The proprietary AMBE codec bothers me, too. I think that a closed, license-encumbered, proprietary codec is entirely inappropriate for ham radio use.

Re:Serindipidy. (0)

Anonymous Coward | more than 3 years ago | (#33646630)

This must be Google doing. Because on my slashdot, there's no news about voice codecs.
On the other hand, i was just recently looking for visual-information about "upskirts" and when i finally ended up in slashdot. There it was: new about future transparent airbuses.

Great news (2, Informative)

Anonymous Coward | more than 3 years ago | (#33646164)

>3.75 seconds of clear speech in 1050 bytes

That's 2240 bps, 2.19 kbps, quite impressive. Maybe one day they can beat MELP (up to 600bps) and remain open.

Excellent work.

Re:Great news (4, Interesting)

Bruce Perens (3872) | more than 3 years ago | (#33646246)

I think you could cut the sample rate in half and get acceptable performance, but I've not tried. Currently I think it's 25 microsecond frames, and each frame has one set of LSPs and two sets of voicing information so it's interpolated into 12.5 microsecond frames. Those lower bandwidth codecs do 50 microsecond frames. Go forth and hack upon it if you'd like to see. Also, there are some optimizations that are obvious to David and Jean-Marc (and which I barely understand) that haven't been added yet. One is that the LSPs are monotonic and nothing has been done to remove that redundancy. Delta coding or vector quantization might be ways to do that. I understand delta coding but would not be the one to do VQ. Another is that there is a lot of correlation of the LSPs between adjacent frames, so you don't necessarily have to send the entire LSP set every frame. And there is probably lots of other opportunity for compression that I have no concept of.

Re:Great news (1)

Yaur (1069446) | more than 3 years ago | (#33646880)

You mean milliseconds... 25 microseconds is less than one sample at 44khz. Somewhere around 100ms is the lower edge of where its "noticeable" in the flow of the conversion.

Anonymous (-1, Offtopic)

Anonymous Coward | more than 3 years ago | (#33646234)

Thanks for the codec.
It's really usefull for communicating with our customers through our newwebsite application on http://www.zeecontainer-kopen.nl

Keep up the good work.
Best regards.

Thankyou! (1)

thephydes (727739) | more than 3 years ago | (#33646282)

I use digital almost exclusively and have wondered about when a suitable open source voice project would emerge. I look foreward to seeing it developed further. Tim VK4YEH

Re:Thankyou! (3, Informative)

Bananatree3 (872975) | more than 3 years ago | (#33646298)

you'll be happy to know that it's a fellow Australian ham developing this Codec2 - David Rowe, VK5DGR Here's a link to David's development page [rowetel.com]

Re:Thankyou! (1)

thephydes (727739) | more than 3 years ago | (#33646342)

Yes, I heard about this a couple of days ago - on VKlogger forums maybe? Have made a donation to the cause by the way. Tim

Re:Thankyou! (1)

Bananatree3 (872975) | more than 3 years ago | (#33646480)

Great work on the donation- try to get some local hams to donate - David responds well to all kinds of donation sizes, small and big :)

Awesome (1)

sv_libertarian (1317837) | more than 3 years ago | (#33646296)

I hope this takes off. It would be great to have a good OSS voice codec for amateur radio.

Packet loss? (3, Interesting)

Amarantine (1100187) | more than 3 years ago | (#33646332)

I didn't see it mentioned when quickly scanning TFA, but how does this codec handle packet loss?

It is all nice and well to develop a codec to cram as much speech as possible in as few bits as possible, but in this case, one lost packet could mean a gap of several seconds. The success of a low-bandwidth codec, at least when it comes to IP telephony, also depends on how well it can handle lost packets. Low bandwidth codecs are usually used in low bandwidth networks, such as the internet, and there the packetloss is the highest.

Same goes for delay and jitter, by the way. If a stream of packets is delayed, and more voice is crammed in fewer bits, then the delays in the voice stream will get longer too.

Re:Packet loss? (4, Informative)

Bruce Perens (3872) | more than 3 years ago | (#33646406)

We don't know yet, but I don't see how it could be worse than AMBE in D-STAR, which makes various eructions when faced with large packet loss. I did various sorts of bit-error injection inadvertently while debugging yesterday, and right now you still get comprehensible voice with significant corruption of the LSP data. This, IMO, indicates an opportunity for more compression. Handling the problems of the radio link is more a problem for forward error correction, etc.

Re:Packet loss? (1)

rrossman2 (844318) | more than 3 years ago | (#33648222)

It would be great to be able to get this on phones. I know most VoIP/SIP type applications work fairly well on 3g, but if you don't have 3g coverage (or are on a smaller cell company who only licenses EDGE from the other GSM carriers) then it kinda-sorta works with 3 second delays and the occasional garbled audio. For example, my Nokia N95 on Immix doesn't get 3g (Immix didn't opt for 3g coverage from T-Mobile or AT&T even if the phone supports it and you're on their networks) but does edge at around 350kbps or so. Fring voice calls work, with the flaws mentioned earlier. If I'm on WiFi with my phone, obviously it's much better and the delay + garble disappear even if the other side is on a 3g link (in that case Verizon).

With more and more people with smart phones, it would be sweet to be able to bundle the codec in such a way that the phones would be able to use it in applications such as fring, google talk, etc so you could talk to any one of your smart phone friends without needing to use any minutes, and if they are the ones you talk to the most you could drop your minute plan down to next to nothing.

(It would be even sweeter for me since the only people I really call all have smart phones, I could just get a Verizon or AT&T data only plan for $35 or whatever and make all my calls via SIP)

English only ? (4, Interesting)

Yvanhoe (564877) | more than 3 years ago | (#33646374)

At such high compression rates, one could wonder if the optimizations to transmit clear speech make assumptions about the language used. Does it work well with French ? Arabic ? Chinese ?

Re:English only ? (4, Interesting)

Bruce Perens (3872) | more than 3 years ago | (#33646452)

The basic assumptions are based on the mechanics of the vocal tract, and I suspect not high-level enough to differ across languages, but obviously it would be nice to hear from speakers of other languages who test it. We could also use a larger corpus of spoken samples for testing.

Re:English only ? (1, Interesting)

Anonymous Coward | more than 3 years ago | (#33646860)

One of the earlier observations in this field was that a low-bandwidth filter specifically hurt languages with hissing sounds, and I presume you'd have similar problems with click sounds.

As the GP indicated, Chinese (Mandarin) would be an important addition as it's a tonal language, and compression should not blur that distinction. Languages such as Arabic have emphatic consonants, which aren't that common in Western languages either. But French? That's for all practical purposes identical to English. Finnish would make more sense. As for the click sounds, try Zulu. It apparently has 15 distinct click sounds, and there should be enough speakers online.

Re:English only ? (1)

Ecuador (740021) | more than 3 years ago | (#33646984)

The languages you mentioned don't really use much different sounds. If you want a real test try the clicking sounds in Zulu, Xhosa etc.

Re:English only ? (1)

jmv (93421) | more than 3 years ago | (#33647112)

Actually, this is not low enough for language to really have an effect other than tonal vs non-tonal languages. As long as you "train" quantizers with multiple languages you're fine. I would not expect language-dependencies to actually kick in until you hit something like 100 bps or below (i.e. when you need to do speech-to-text in the "encoder" and text-to-speech in the decoder).

Re:English only ? (0)

Anonymous Coward | more than 3 years ago | (#33647124)

Basic sounds are the same everywhere, is not like different symbols for different languages, because the vocal tract is more or less the same for every human,scaled(universal), as Bruce points out.

There were differences in the range of sounds used by every language, but today that is not the case, thanks to communications advances. E.g Japanese people had incorporated external language sounds and it is not alien anymore.

Merry "Kurisumasu" (1)

tepples (727027) | more than 3 years ago | (#33648128)

the vocal tract is more or less the same for every human

Different languages use different parts of the vocal tract. If a language distorts clicks, it won't pass Zulu or the Bushmen languages. Languages also make different distinctions on the parts of the vocal tract they do use. If a codec distorts pitches, it won't pass intelligible Cantonese, Yoruba, or Mandarin.

There were differences in the range of sounds used by every language, but today that is not the case, thanks to communications advances. E.g Japanese people had incorporated external language sounds and it is not alien anymore.

The only foreign sounds that have been fully assimilated into the phonology of Japanese are the 'y' compounds (e.g. "kyo", "hya", "chu" (phonemically "tyu")), borrowed a long time ago from a Chinese language. Otherwise, there's still a lot of rounding-to-the-nearest-phoneme that goes on: Merry "Kurisumasu". (If we killed Santa, would it be "Kurisumashita"?)

METAL GEAR?! (0)

Anonymous Coward | more than 3 years ago | (#33646388)

"This is Snake, I am in front of the disposal site..."

Mumble integration ? (4, Interesting)

Anonymous Coward | more than 3 years ago | (#33646422)

One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well).
Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.

Re:Mumble integration ? (1)

Bananatree3 (872975) | more than 3 years ago | (#33646486)

awesome idea as a potential beta testing group?

Re:Mumble integration ? (4, Interesting)

Bruce Perens (3872) | more than 3 years ago | (#33646520)

One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well). Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.

Is there an existing Mumble developer whom we could get interested in this? It might be that we should take some of the Alpha-isms out of the code first.

Re:Mumble integration ? (5, Insightful)

Inda (580031) | more than 3 years ago | (#33648112)

Is this really Slashdot? Do I have a DNS error?

These are the stories I used to enjoy. I don't realy understand them, but they make a good read.

Impressive. (1)

mrjb (547783) | more than 3 years ago | (#33646580)

1050 bytes for 3.75 seconds of speech is the equivalent of 2240 bits per second- good enough that an old-school 2400 baud modem would be able to transfer speech in realtime. Impressive. But I seem to recall that the speech synthesizer of the TI-99 stored voice audio in as little as 1200 bits per second. It was well-documented enough that TI emulators emulate the speech synthesizer as well. But the sound quality left to be desired, which is probably one area where codec2 shines. I've listened to the example files and the sound quality seems fine- I can't tell the difference in audio quality between source and target files. Partially this may be because the source material already seems to be bandwidth-limited- probably using an 8 kHz low pass filter as is common for telephony applications.

Re:Impressive. (1)

Rogerborg (306625) | more than 3 years ago | (#33647060)

I can't tell the difference in audio quality between source and target files.

Really? It sounds quite pronounced to me. It's still very impressive, but it's not magic.

Re:Impressive. (3, Informative)

lobiusmoop (305328) | more than 3 years ago | (#33647552)

The DSP Innovations [dspini.com] codec manages decent speech quality at 600bps, god knows how (proprietary closed source). I think this the state-of-the-art in low bitrate codecs just now.

Oh goody! (-1, Offtopic)

Anonymous Coward | more than 3 years ago | (#33646638)

Another Codec!

Speech Rec on compressed stream? (1)

hotdiggity (987032) | more than 3 years ago | (#33646664)

Hi Bruce - great work.

I didn't dig completely into your site, but was just wondering if groups are doing work on speech recognition algorithms on your compressed bitstream? Is this an active area of research?

Re:Speech Rec on compressed stream? (1)

Bruce Perens (3872) | more than 3 years ago | (#33647044)

The codec work is by David Rowe. I don't know of anyone doing speech recognition.

Wonderful Name (1)

schn (1795404) | more than 3 years ago | (#33646916)

codec2, the... second codec.

Re:Wonderful Name (1)

Bruce Perens (3872) | more than 3 years ago | (#33647058)

Right. The first is the one used in D-STAR.

SVN Repository (0)

Anonymous Coward | more than 3 years ago | (#33647410)

What's up with the source code repository? I keep getting a variety of "repository temporarily relocated" messages when I try checking out the source.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...