The fight to keep standards Open and Free is raging in the audio compression business. With mp3 tearing up bandwidth and the court system, Christopher Montgomery and the rest of the Ogg Vorbis team are working hard to ensure that the mp3 format has a Free alternative in their system, which seems to outperform mp3 everywhere it counts. I got the opportunity to pull Chris away from development just long enough to tell us exactly what's going on, and to answer some questions about the process and the product necessary to take on mp3.
Vorbis is a hybrid time/frequency transform coder like mp3, but the similarity really ends there; it's more similar to TwinVQ in some ways (many shared mechanisms, albeit used somewhat differently).
Like mp3 (and virtually every other useful transform coder), we first look for strong changes and natural breaks in the input audio, and can use this information to break up the incoming audio into different sized blocks. When you lose information in the frequency domain, the resulting noise spreads throughout the time domain. A very strong spike in time will get smoothed out by frequency quantization, so the larger the block, the more audible it is. You want to isolate these strong, sharp events in smaller blocks.
Past this point, the similarities with mp3 end. Vorbis can do a time-domain pre-encoding using wavelets to further reduce spreading of time events and non-tone data. The current libvorbis doesn't have the code to do this yet, but the hooks are there for when we do finish this code (this feature will be post 1.0. Wavelets are still something novel that no one else is using in serious production yet, and we need to do more real R&D before it's ready).
Vorbis takes the time data directly to the frequency domain with an MDCT, where mp3 first subbands the data. The polyphase pseudo-QMF filter that mp3 uses for subbanding is not completely orthogonal; no matter how good the implementation, there will always be some aliasing. For this reason, Vorbis dispenses with subbanding altogether and just uses a large MDCT.
Vorbis then computes line-by-line masking curves for local peaks, long-distance simultaneous tone masking, simultaneous noise masking and temporal masking. These curves are use to separate inaudible tones from audible tones, and then choose a frequency domain amplitude curve that represents the 'base energy' of that audio frame. The base energy curve (I call it a floor) is subtracted from the MDCT data (like a whitening filter), which produces 'frequency residue'. The floor is converted to an LSP (line spectral pair) representation and then it and the MDCT residue are vector quantized into the final output codewords by a cascade of custom VQ codebooks that are packed along in the header of the bitstream. The result is one vorbis audio packet.
The audio packet is them embedded into an Ogg bitstream page and the page (when full of packets) is shipped out in the stream.
The decode side does the reverse, but without all the masking analysis. We extract the string of packets from the Ogg bitstream, and for each packet unpack the floor and residue, take the dot product and then do an inverse MDCT to recover the time-audio frame. Each frame is lapped and added to the previous frames and we get the original audio out.
Very simple, see? :-) To be fair, the masking analysis is the only real black magic. What I'm doing is almost entirely based on the masking curve data published in the late 50's by Robert Ehmer.
One thing the current release of Vorbis does not have is channel coupling (like mid-side stereo, although we'll be doing it differently). Beta 1 and beta 2 actually include multiple totally separate channels. The fact that we equal and better mp3's quality missing this huge piece is exciting. Mid/side stereo in mp3 drops the final bitrate of a stereo stream by 30-50kbps. To get a real comparison of Vorbis vs. mp3, compare mono streams or force the mp3 encoder not to use joint/intensity stereo (eg, -m m in LAME 3.84). Vorbis at 56kbps mono beats mp3 at 80kbps. At equal bitrate there's no comparison at all.
Slashdot:For those just tuning in, what's the project all about, and how did it get started?
The Vorbis codec is a lossy audio compression codec similar to mp3, but we're shooting for better performance (lower bitrates for a given level of quality) as well as keeping it totally Free as in Beer and Speech. I started work on Vorbis a week or two after Fraunhofer sent out 'cease and desist' letters to several free mp3 encoder projects in the fall of '98. At that point, it was clear the worst case was happening; the squeeze was on by commercial entities to not only dominate the legal distribution of music, but the underlying technology as well. A 'free license' to owned technology means nothing (and that's why Real and Windows Media are also worthless as infrastructure to us).
Fraunhofer (and MPEG in general) and the RIAA are also a bit too friendly behind the scenes, if not entirely in bed together. If you really believe SDMI is about protecting the artists, well, I have some wonderful Oklahoma beachfront property for sale at prices that are a steal, but you'd better act fast!
It's ironic that at the same time mp3 has been an agent to open up music distribution, it's becoming a tool for commercial interests to reclaim control. If online music is to fulfill its potential, an oligarchy can't be allowed to control its distribution or the technology behind it. The Internet would not have reached critical mass if it was a product of Microsoft or AOL or Oracle... It wouldn't ever have happened. Corporate control of every facet of online music will just strangle it in the cradle. The inventors of the Internet 'gave it away,' and that's been a great thing for business. However, the important lesson here is that the foundations were set in stone and wrought from iron before any company had self-interested influence. TCP/IP (brought to you by research laboratories) is elegant and farsighted; it's taken thirty years for it to begin wearing thin. E-mail is similarly brought to you from academia. HTML, on the other hand, (as ultimately brought to you by Netscape and Microsoft) makes good engineers weep and gnash their teeth.
We need to have unbreakable free music foundations in place before letting the commercial interests have their way with the infrastructure. I wouldn't rely on any infrastructure they build themselves.
Ogg and Vorbis are trying to continue the principles for which we in the open world see mp3 standing.
Slashdot: What are you working on right now?
Vorbis second beta. General quality improvements, additional bitrate modes in the encoder (96-350kbps stereo, mono modes), bugfixes, etc. After beta 2 (look for on Tuesday at about the time LinuxWorld Expo in San Jose opens), we have low bitrate modes to finish, channel coupling (joint stereo and joint surround) and constant bitrate modes (Vorbis by default is VBR).
Others in the project are working on tools... Mike Smith, Kenneth Arnold and others are knee deep in utils, Jack and Chad of Icecast are adding Ogg streaming to Icecast, Ralph Giles and Rob Kaye are working on stream mixing, metadata streams (Ralph is also hacking on MNG over Ogg). Kim, Tori and Emily at iCast are writing documentation...
The project has also outgrown our group. There are now Vorbis news sites (like govorbis.com and vorbiszone.com), an all-vorbis music label (vorbisonic.com) and other vorbis related sites poppin up. angrycoffee.com is working on Vorbis tutorials for beginners.
Within the core team, we need to get more people who are up on signal processing aspects like in the community around LAME.
Slashdot: Is this your full-time thing?
Yes. Ogg and Vorbis development are sponsored by iCast and they're also deploying it internally. In addition to paying salaries, they're pitching it to the industry and providing legal assistance.
Slashdot: Xiphophorus is a collection of people, projects and tools. What's going on with the collective?
Vorbis is a 'serious' project now, so we're expensing the massive espresso consumption ;-) The few of us who are now getting paid to do this can afford to be extremely intense about it. Other contributors still come and go. Right now, we're all pretty much focused on Ogg Vorbis; I have to apologize to all the cdparanoia users out there. I'll be working on it again in the future, but right now I only have so many cycles.
Ogg and Vorbis are currently getting more outside attention than we can really gracefully handle (well, handle and still get work done at the rate we're used to, which was still always slower than we want ;-) Apparently someone on some list claimed 'Vorbis was dead' because we hadn't updated the Web site in a month. Ha! If we were 'dead' we'd have plenty of time to write HTML :-) And answer mail. Anyone who sent me personally mail in the past month and a half, I'll answer it eventually, I promise...
Slashdot: Are you out to replace mp3 as the sound format of choice? If not, why not, and if so, what are the challenges?
We're out to keep things Free (capital F intentional). If MPEG turned around and made the mp3 spec and patents public domain, we'd definitely declare victory (and then continue coding to improve Vorbis). But we all know that isn't going to happen. More likely, if Fraunhofer decides we're a threat, they'll just delay licensing (remember kids: free licenses to binaries aren't worth jack) until the competition dies down. Then they'll squeeze again.
Honestly, I don't think we're going to 100% replace mp3 (people still use RAR for Christ's sake). I lay better than even odds on us eclipsing mp3 in the next year if the licensing picture stays the same. We also intend to have 80-96 kbps stereo streams that sound better than mp3 128 by that point, so people (and businesses) won't exactly have to give anything up to save money. Also expect hardware support soon, possibly by end of year if things go smoothly.
Slashdot: You talk a lot on your Web site about Open software. Which came first, the desire to deliver multimedia, or the drive to develop it openly?
My real hacking skills germinated at the MIT Lab for Computer Science. I'd coded practically all my life before getting to MIT, but I'd always been the best coder I knew, so I hadn't really learned much. When I got to MIT, I didn't feel stupid but it drove home that I had a lot of catching up to do. Most of my mentors were from the previous generation (all open source people) but a few of the very hardcore people were younger than me, too.
I've been a musician all my life too, albeit not a very good one (I feel a bit like Soliari in Amadeus) and Ogg was born in '93 when I bought a 1 Gig hard drive and a sound card and thought 'this is unlimited space! I can put music on this! And do things with it!'. I quickly found out that a Gig wasn't unlimited by a long shot, not even in '93 (I filled it with mail eventually), so I started muddling with compression. Greg Hudson made an offhand remark about there not being any good, free, music compression libs at the time, and Squish was born. I got a letter from a lawyer a few months later politely informing me that 'Squish' was a registered trademark and if I didn't change the name of my software, I could forget ever owning anything in the Western World ever again. Mike Whitson renamed the codec 'OggSquish'. The Ogg project was born. Oh, and we plan to release an updated Squish codec again sometime in the next year.