Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Open Source Transcription Software?

kdawson posted more than 4 years ago | from the what-he-said dept.

Open Source 221

sshirley writes "I am beginning to do some interviews with family members and will do some audio journals for genealogy purposes. I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially. But does this exist in the open source world?"

cancel ×

221 comments

Sorry! There are no comments related to the filter you selected.

CMU Sphinx (5, Informative)

Singularity42 (1658297) | more than 4 years ago | (#32971950)

Looks active.

Re:CMU Sphinx (5, Informative)

Narksos (1111317) | more than 4 years ago | (#32972384)

What you want is dictation software. I just (last week) spent significant time looking in to this.

For open source you have two main options: CMU Sphinx and Julius/Julian. Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that (the source for the CMU Sphinx demos show how to get input from a mic/wav file (if you've got something other than PCM you'll just need to convert it) and set up various engines.

CMU Sphinx [sourceforge.net] appears to be mainly for research purposes. You can run it in a few different modes: one with a fixed grammar (for command systems, Gnome's voice control uses sphinx in this mode), one (what you'd be looking for) uses a weighted dictionary. I didn't train it to my voice (and you wont be able to train it for transcriptions) and I was getting fairly lousy recognition rates with my $20 Logitech USB Microphone. It might work better with a high quality headset, but I imagine you wont both be wearing one.

Julius/Julian [sourceforge.jp] lacks a good acoustic model for English. VoxForge [voxforge.org] is working on one, but it isn't anywhere near complete.

Here is a good article that sums up the current projects [eracc.com]

Re:CMU Sphinx (3, Insightful)

notthepainter (759494) | more than 4 years ago | (#32973490)

Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that

Actually, it can be rather hard to do that. I was one of the founders of MacSpeech and there is a surprisingly large set of details you have to deal with, punctuation, capitalization, etc... Of course since you wouldn't be making a commercial product much of the gloss need not be coded but once you have the engine, the part that takes the audio source and converts it to text, you still have a large amount of work left over.

Re:On the other hand... (2, Interesting)

vrmlguy (120854) | more than 4 years ago | (#32972636)

I just slice everything up into segments of 60 seconds and let Google Voice transcribe it for me. Sure, some nay-sayers might point out that it's slower that transcribing it all manually, but they don't get that I'm getting Google to do the work for me!

Re:On the other hand... (1)

quickOnTheUptake (1450889) | more than 4 years ago | (#32973214)

Informative?
Attention slashdotters, There is at least one retard on the loose. He may be calling himself and playing tapes into the phone. If you encounter him do not engage him as he is armed with modpoints and may use them erratically.

Re:CMU Sphinx (2, Informative)

Anonymous Coward | more than 4 years ago | (#32972948)

Sphinx is what many companies use to get started with, but it's far too raw to be useful by itself. You need to update the HMM back-end extensively... and train it. Even still, your success rate is only 80%... meaning: 1 in 5 words, if spoken slowly, will still be wrong.

Dear aunt, (5, Insightful)

Anonymous Coward | more than 4 years ago | (#32971954)

let's set so double the killer delete select all.

Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

Re:Dear aunt, (4, Insightful)

Kenoli (934612) | more than 4 years ago | (#32972116)

A program capable of "making informed guesses based on context" seems perfectly plausible, though that's not part of speech recognition per se.

Re:Dear aunt, (3, Insightful)

conchubhair (1453303) | more than 4 years ago | (#32972294)

The problem you are describing (continuous speech recognition) is not solved yet. Even the best state of the art technology is not going to be perfect, and having two speakers will make it even less useful. If you really need the stuff transcribed, you can pay for online services to transcribe it (if they offer really good quality transcription, they are most likely using humans) or you can transcribe it yourself (you can buy software to help speed up the transcription process - including a foot pedal to pause/play the audio, e.g. http://www.nch.com.au/scribe/ [nch.com.au] ). My company does a lot of work in speech recognition, and we have tried most of the companies that offer transcription. Some of them even provide APIs so you can code something up. The best fully automatic, commercially available transcription I have seen is from Yap Inc. (http://yapme.com/). If the speaker doesn't have a crazy accent and speaks at a normal level and pace you can get great results, but like all fully automatic transcriptions it can get it wrong. The benefit of Yap is that you can get back the confidence scores and alternates for each word, so if you had a dictionary of your own commonly used words you can pick out a better transcription. You pay by the word for transcription (it is a small amount, but it will add up if you're doing hours of audio). If you're willing to wait, the technology is improving all the time, so you could archive the audio for now and return to have it transcribed in a few years. If you need this done now and want something you can actually read then your cheapest option is to do it yourself, and maybe invest in some software to speed it all up. Unless you have a lot of time on your hand and access to a lot of transcribed audio to build the language models, using any software at home is not worth your while.

Re:Dear aunt, (2, Informative)

Flyerman (1728812) | more than 4 years ago | (#32973144)

The parent's link is exactly what I set up on a client's machine. They purchased the headset and pedals but the software itself was free and worked wonderfully.

Re:Dear aunt, (0)

Anonymous Coward | more than 4 years ago | (#32973074)

Google knows what im going to search for before i do!

Re:Dear aunt, (1)

ooshna (1654125) | more than 4 years ago | (#32973552)

Are you in Soviet Russia where Google searches you? Sorry had to do it.

Re:Dear aunt, (1)

ooloogi (313154) | more than 4 years ago | (#32972148)

Meanwhile there is commercial software available that runs on a commercial operating system that does a pretty good job of it, using a whole lot of computing power to make the required informed guesses.

Re:Dear aunt, (2, Informative)

fuzzyfuzzyfungus (1223518) | more than 4 years ago | (#32972250)

Unless things have improved substantially since Dragon NaturallySpeaking 10, I'd be more inclined to describe the performance as "surprisingly adequate job of it, with training, and offers a vaguely cellphone-esque interface for choosing the correct word when it fucks up".

It isn't comedically awful; and it likely beats typing with your stumps, or your eyelids, or whatever; but "pretty good" is being very generous.

(Again, unless things have improved markedly since then) the software works best when used interactively, which allows it to suggest corrections, and you to make them, in real time. It also helps if it has been trained to your voice beforehand. The results of using it non-interactively, on a recording of somebody that it hasn't been trained for, will produce results error-filled enough that you might actually find manual transcription faster than manual editing(or, if you don't mind your family sounding like they've suffered head trauma or exposure to Dadaism, you can just store the recordings, make do with the text, and re-run the process in the future, when the software is better).

Re:Dear aunt, (0)

Anonymous Coward | more than 4 years ago | (#32972514)

So if it can be trained reasonably well to your own voice, why can't you listen to the recording via headphones and parrot the words in your own voice? It may be faster than typing, even with the occasional correction.

If others in your family are interested in helping with your project, this might be a good way to get them involved, too.

Re:Dear aunt, (1)

cgenman (325138) | more than 4 years ago | (#32972876)

There is a free iPhone dragon client, which sends the audio back to their servers for processing. There isn't any training, but there probably wouldn't be training on a family member's old tapes either.

It's possible that Dragon might work for their needs, or at least be much easier to get equally bad data back as other solutions. Try the iPhone client and see.

Re:Dear aunt, (1)

icebraining (1313345) | more than 4 years ago | (#32972164)

And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

Never? Not only it's possible, as there are already some papers on prototypes of grammar-switching context-based speech recognition engines.

Re:Dear aunt, (0)

Anonymous Coward | more than 4 years ago | (#32972166)

Google does a decent job at it in their YouTube and Voice products.

Nothing open-source though. Hell, even the open-source voice recognition stuff that you have to train sucks donkey balls and that's a lot easier than free-form transcription.

Re:Dear aunt, (1)

morgan_greywolf (835522) | more than 4 years ago | (#32972320)

Google does a decent job at it in their YouTube and Voice products.

They also do a decent job on Android, which is open source. Zero training required. I wonder how easy it would be to rip the voice recognition out of Android source?

Re:Dear aunt, (1)

hawguy (1600213) | more than 4 years ago | (#32972426)

I wouldn't say that Google Voice does a decent job, most of the voice mails left for me on Google Voice come out like this:

Hey when you get this is a I'm via out what you know what it's Johnathon of the bad idea. So maybe we can meet that I don't know about that. And I do that, but what I thought I had, but that's also the Damon now, but I thought I had bought the house. I had to get out of the night okay and give me a call. bye bye.

Except for the "bye bye" at the end, none of it is close enough to the actual message to be useful. The actual message said nothing about Johnathan, any meeting, Damon, or a house.

Re:Dear aunt, (1)

kagaku (774787) | more than 4 years ago | (#32972836)

When you consider the quality of audio input it receives I think it does a fairly decent job.

But Windows Speech Recognition... (4, Informative)

Monkeedude1212 (1560403) | more than 4 years ago | (#32972168)

Most Windows Vista or Win7 machines come with a built in transcribing feature, that you can enable in the control panel (Win7, under ease of access, Speech recognition).

However - the only way it works properly is if you train it to understand you personally. You load your profile, and it'll run you through a whole bunch of test sentences. The FULL test takes you about 20 minutes I think (It's been a while since I've used it) - and actually works quite well. There is a cut off point at about 2 and a half minutes if you want to stop and try it out. It actually makes it keyboard and mouseless if you want. When you open a browser it highlights everything on the web page thats clickable and assigns it a number, and you simply say "Click 7" and it hits the reply button for you. Then you talk when the textbox has focus and it'll transcribe every word you say.

I did this for my girlfriend's paper once, I read it aloud (you have to mention things like comma, end paragraph, etc) and put it into a Word document. Out of a 15 page single spaced Essay - it got 3 sentences wrong - and that's only because I was mentioning some of the more Obscure greek names (she's a history major). It managed to get full sentences regarding Octavia and her fondness of libraries without error, which I thought was odd since thats not a name you hear every day.

Anyways - if he wants to do this, he should record the test phrases (there will be a lot though) and have each of his interviewees read the test sentences so he can then relay those through the computer and train the computer for each person.

All in all - he may still run across a few errors, but its not nearly as bad as say Google Voice Mail, which tries to figure out what you're saying without having any previous knowledge on how that person speaks. Windows Speech Recognition is something that will handle what he's after though.

Re:But Windows Speech Recognition... (1)

Monkeedude1212 (1560403) | more than 4 years ago | (#32972194)

Forgot to mention: I'm not entirely sure he needed an "Open Source" Solution as much as he needed a "cost effective" solution though - he makes no mention of altering any code. So I mean, Windows Speech Recognition is not exactly Open Source.

Re:But Windows Speech Recognition... (1)

markdavis (642305) | more than 4 years ago | (#32972532)

>So I mean, Windows Speech Recognition is not exactly Open Source.

It's not exactly multi-platform either. He might be using Linux, for example (like so many of us do). Really, the original post left off a lot of potentially useful (narrowing) info.

Re:But Windows Speech Recognition... (1)

unix1 (1667411) | more than 4 years ago | (#32972682)

All in all - he may still run across a few errors, but its not nearly as bad as say Google Voice Mail, which tries to figure out what you're saying without having any previous knowledge on how that person speaks.

Each Google voicemail transcription has an option for a user to mark whether the transcription was accurate or not. I wouldn't be surprised if they were tying that into the caller-specific profile. So, if you leave a message for friend A and he marks the transcription good, that data may be used when you call not just person A, but everyone else in the future. In fact, it wouldn't make any sense otherwise. Now, how much actual "learning" algorithms they have on the back side, I cannot tell you.

Re:But Windows Speech Recognition... (1)

LordLucless (582312) | more than 4 years ago | (#32973456)

It probably helps that Greek is transliterated when rendered with the English alphabet - which means most of the funky names you were saying were spelled phonetically, and thus easy for a recognition engine to pick up - even easier than a lot of regular English words.

Re:Dear aunt, (0)

Anonymous Coward | more than 4 years ago | (#32972188)

"dear aunt" explained:
http://www.youtube.com/watch?v=tLa3Wac4O2A#t=25
(sorry about video quality, back then youtube never looked better than your DVB-channels)

Re:Dear aunt, (0)

Anonymous Coward | more than 4 years ago | (#32972204)

yet

Re:Dear aunt, (2, Insightful)

ThomConspicuous (1004135) | more than 4 years ago | (#32972242)

It's already being done in medical dictations that are also recorded and double checked by Transcriptionists. Speeds up work flow immensely even with the human verification in place.

I even witnessed an East Indian doctor with a heavy accent dictate normally and have the software pick up everything stated. He was pleasantly surprised.

It works.

Re:Dear aunt, (0)

Anonymous Coward | more than 4 years ago | (#32972502)

Tell that to all of the companies worldwide who have no problems with their automated voice recognition systems.

Just because you saw some video of a prerelease version of Vista's voice recognition fucking up doesn't mean that it can't be done. Many commercial solutions handle it well and even Vista's voice recognition works pretty damned well at this time.

Re:Dear aunt, (1)

xSauronx (608805) | more than 4 years ago | (#32972508)

i say set up google voice, dial the gv number and do your questions as voicemail into a speakerphone

youll get a transcription in your email. do a question at a time. problem solved!

Re:Dear aunt, (2, Insightful)

theheadlessrabbit (1022587) | more than 4 years ago | (#32972616)

let's set so double the killer delete select all.

Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

...a computer program cannot do yet

Re:Dear aunt, (4, Informative)

painandgreed (692585) | more than 4 years ago | (#32972652)

Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

Funny, considering my job is training doctors to use voice recognition to do all their reporting. Actually, it works fairly well. I also don't mean dictating something that goes to transcriptionists. The doctors dictate the report. The dictation is transcribed into text. They review it and sign off. We got rid of all our transcriptionists years ago. The time for a report to get done went from 24 hours with transcriptionists to 24 minutes with voice recognition. The amount of errors was cut in half. The doctor's work load was also lessened as they could check the final version while still dealing with the data rather than having to go back and review everything all over again a day or two later. Speech recognition was a problem seven years ago, but hardly at all in the last five or so. Yes, the have to go over their dictations and occasionally make some minor corrections. There's always background noise to worry about and some people's accents are hard even for another person to get through, but for things that require quick turn around and need to be verified by the person who is doing it, voice recognition already is the gold standard.

PS several of the doctors like it so well they bought Dragon (pretty much everybody but Phillips use Dragon for their speech engine) for home and use it there for all their email and other writing.

Re:Dear aunt, (1, Insightful)

Anonymous Coward | more than 4 years ago | (#32973312)

you're refering software some one is -trained- to use for a specific purpose. that's not the same as a general purpose voice recognition program

Re:Dear aunt, (4, Interesting)

BitZtream (692029) | more than 4 years ago | (#32973550)

Ironically, I have a family member he runs a business doing transcription for doctors ... because every time the try voice recognition software they get pissed off and go back to real people.

Being a fan of Dragon Dictate myself, I know its not that great and I know it has a fit when you start throwing accents at it, training or not.

I call bullshit on your claims of using Dragon for everything.

Re:Dear aunt, (0)

Anonymous Coward | more than 4 years ago | (#32972690)

a computer program cannot do? please, you sound like a philosopher or worse yet, an arts student.

Re:Dear aunt, (2, Informative)

Bluesman (104513) | more than 4 years ago | (#32972894)

by making informed guesses based on context, which a computer program cannot do.

The Perl interpreter can.

Re:Dear aunt, (1)

rgmoore (133276) | more than 4 years ago | (#32973016)

And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

It's also worth pointing out that in practice people frequently fail to understand each other perfectly. In conversations, we routinely ask questions to ensure that we've correctly understood what the other person said. If you ever watch a TV newscast with the closed captions on, you can see that the people producing those captions routinely make glaring mistakes. High quality human produced transcripts can only be produced by double and triple checking the transcripts against the source recording to make sure they're correct. There's no reason to expect computer generated transcripts to be perfect, either.

Sphinx (5, Informative)

SaXisT4LiF (120908) | more than 4 years ago | (#32971964)

Carnegie Mellon has an open source speech recognition project you might want to look into. Sphinx [cmu.edu]

Re:Sphinx (0)

Anonymous Coward | more than 4 years ago | (#32972254)

Yep. Sphinx, while FAR from perfect, is the only open source option out there. Also, you can create your own voice dictionaries for it.

Unfortunately... (3, Interesting)

dmneoblade (848781) | more than 4 years ago | (#32971966)

I spent several month searching for something like this. Open-source voice recognition is in really infant stages, and there does not seem to be much interested in improving the few things we have.

I've wondered about this too (2, Interesting)

itamblyn (867415) | more than 4 years ago | (#32971968)

It seems like there should be some way to "hack" the audio transcription that google offers through google voice or youtube. Unfortunately I haven't found a way to upload a file. With youtube, if you make a fake movie, it gives an error that it can't be transcribed. Getting google voice to work would require some sort of phone interface I suppose...

Re:I've wondered about this too (4, Informative)

Enuratique (993250) | more than 4 years ago | (#32972214)

Google relies on Twilio [twilio.com] for their audio transcription.

Re:I've wondered about this too (1)

itamblyn (867415) | more than 4 years ago | (#32972506)

Source?

Best Idea (3, Informative)

Anonymous Coward | more than 4 years ago | (#32971980)

just upload it to youtube, its genius google transcription technology will make everything sense out of it.

Re:Best Idea (0)

Culture20 (968837) | more than 4 years ago | (#32972112)

genius google transcription technology

That's not a transcription technology, that's the comments. People tend to repeat the phrases they like. For example:
genius google transcription technology. ROFLMAO
Why'd you steal this video? The original has over 2,000,000 views. You stole this.
I like when he said "upload it to youtube" ;) 3 (it looks like a heart!)

Having delt with this... (1)

Skuld-Chan (302449) | more than 4 years ago | (#32972000)

I'm interested in open version of a transcription app (I run a lab with a lot of this software/equipment) but this is a very vertical market - up until recently there wasn't any standard interface for the foot pedal (newer ones are hid usb devices now).

I had to throw away a bunch of sony serial devices because they only worked with one app I can't make work on newer versions of Windows.

youtube (1, Informative)

Anonymous Coward | more than 4 years ago | (#32972012)

upload to youtube and let it create closed captions. the results won't be perfect, but it will be better than most software.

simon and julius (0)

Anonymous Coward | more than 4 years ago | (#32972016)

I've tried simon and julius, but couldn't get past the learning curve to do actual transcription. I will say that it looks like both could be better for recognizing "just your own voice" once you get past the learning curve enough to train. The commercial software is good at recognizing everybody's voice, which isn't that helpful for transcription.

XTrans (2, Interesting)

ceraphis (1611217) | more than 4 years ago | (#32972042)

Why don't you give XTrans a shot: XTrans [upenn.edu]

Re:XTrans (1)

nextekcarl (1402899) | more than 4 years ago | (#32972696)

Being interested in this for meetings after checking out the user manual for xtrans it sounded really interesting, but I can't run it on my system and I can't find any useful help on their pages about this error on my Ubuntu system:
/dev/dsp: Device or resource busy
terminate called after throwing an instance of 'QWave2::AudioDeviceError'
Aborted

I'm not even sure where to look next since a quick google search didn't turn up a useful fix.

Re:XTrans (1)

ceraphis (1611217) | more than 4 years ago | (#32972806)

The downloads page [upenn.edu] says it requires QWave for waveform display and playback. Could that be the problem?

Re:XTrans (3, Informative)

Kev Vance (833) | more than 4 years ago | (#32973038)

Ubuntu uses PulseAudio on the ALSA audio subsystem, but that error message indicates XTrans is trying to use the OSS audio subsystem instead. To work around this, try using the Pulse OSS wrapper or temporarily disable Pulse. From the commandline, "padsp xtrans" or "pasuspender xtrans".

Not open source, but hackable = SAPI in Windows (1)

Enuratique (993250) | more than 4 years ago | (#32972044)

Have you looked into the Speech API's baked into Vista and Windows 7? If you're familiar with .NET coding, version 4 of the framework provides easy to use hooks into the speech api. The only problem is it is designed to be used with fairly specific grammars/lexicons (programmer supplied) however it does come with a general speech recognizer - but you'll get some interesting results without training it first. http://msdn.microsoft.com/en-us/magazine/cc163663.aspx [microsoft.com] Downsides also include it only natively supports WAV files but that can be addressed with some rolling-your-own goodness.

No (1)

waldoj (8229) | more than 4 years ago | (#32972066)

I've put a bunch of time into this for a project of my own. The short answer is, no, I have found no such program. I've experimented with a few older programs, but they're useless. Sorry.

Google voice does transcriptions (0)

Anonymous Coward | more than 4 years ago | (#32972140)

You could record it, then call yourself and play it through the phone.

Re:Google voice does transcriptions (1)

ddillman (267710) | more than 4 years ago | (#32972452)

I'm sure the resulting high-quality audio signal will help Google Voice do an even better job than usual...

USB foot control (2, Informative)

wguy00 (985922) | more than 4 years ago | (#32972264)

Buy a USB foot control (check out infinity or fortherecord), and download the free player from fortherecord.com. You can stop, start, rewind and fast-forward without having to take your eyes off the screen or leave your word processing app.

Re:USB foot control (1)

IANAAC (692242) | more than 4 years ago | (#32972760)

I can't tell from their web page, but does their software allow you to speed up/slow down the recording without any distortion of speech?

That's one reason I like Express Scribe.

Open source no. (1)

jnnnnn (1079877) | more than 4 years ago | (#32972306)

Here's a list [wikipedia.org] . In my experience, only Dragon is worth trying, with the following caveats:

  • It helps to spend ten minutes training it for each voice
  • It will still only get 99% accuracy
  • You need a high quality (low noise) recording with a good microphone

On the plus side, correction is easy -- read the document, and select words that look wrong to hear what they sounded like.

Most of the other programs are aimed at very small vocabularies (i.e. 100 words) for accessibility applications (controlling a computer).

Re:Open source no. (1)

markdavis (642305) | more than 4 years ago | (#32972486)

Dragon is not open source. It is not even multi-platform.

Re:Open source no. (1)

ducomputergeek (595742) | more than 4 years ago | (#32972816)

But the link does show opensource solutions on the list. The OP is just stating that in his experience, the only solution he has found that works is Dragon and relating his experience with Dragon.

Re:Open source no. (1)

maxume (22995) | more than 4 years ago | (#32972904)

Yeah, the subject of their post was 'Open source no', so they may have been up to speed there.

I looked, but still do it manually (4, Informative)

ciaran_o_riordan (662132) | more than 4 years ago | (#32972316)

I've worked on loooads of transcripts. I did most of these:

* http://wiki.fsfe.org/Transcripts [fsfe.org]

The best technique I've found is to have mplayer play the audio at 60% normal speed and have a text editor (emacs is my preference) in another window, flick between them with alt-TAB and hit Space to start and pause mplayer.

the command line (4, Informative)

ciaran_o_riordan (662132) | more than 4 years ago | (#32972446)

To play an audio file at 60% normal speed:

mplayer -af scaletempo=scale=0.6 the_file.ogg

And then to check the transcript, change the 0.6 to 1.5 (or 2.0 for someone like Richard Stallman who speaks slowly and clearly).

Re:I looked, but still do it manually (2, Informative)

mutube (981006) | more than 4 years ago | (#32973360)

I'd agree. I did some part-time work transcribing audio a while back for extra pennies. One thing I would add is that instead of using Alt-Tab to switch applications and then hitting space to start/stop I found it was less frustrating to set up global keys for the purpose (I was using KDE at the time, I expect most desktops offer this).

I assigned F12 to skip back 5 seconds and F9 to pause/restart. Using those (esp F12) it was relatively easy to keep up to speed with what was being said without switching away from the editor.

Hmm (0)

Anonymous Coward | more than 4 years ago | (#32972378)

"I am beginning to do some interviews with family members and will do some audio journals for genealogy purposes. I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially. But does this exist in the open source world?"

I think you already have the software, and are testing it on this ask Slashdot question. Well played.

If you don't find anything... (2, Interesting)

afabbro (33948) | more than 4 years ago | (#32972380)

...you could always use RentACoder (er, Vworker.com now) and hire someone for pennies to do it.

My God Man, Just Buy The Damn Shif!! (0)

Anonymous Coward | more than 4 years ago | (#32972406)

What are you a fucking hobo?

HTK (0)

Anonymous Coward | more than 4 years ago | (#32972414)

You can have a look here:
http://htk.eng.cam.ac.uk/ [cam.ac.uk]

I've used it in the past. It's a bit hard to use, but the results are decent.

What you have to realize is that you will need to have _very_ clean recordings,
or else the recognition rate will suffer greatly.

Transcription (1)

ddillman (267710) | more than 4 years ago | (#32972450)

I wish you luck in your quest as I'm also working on genealogy and would like to be able to do this as well. I'd be interested in hearing if you find something that works acceptably well for this purpose. In my experience (IBM Via Voice from OS/2 v.3 days to Dragon Naturally Speaking 10) the state of the art just isn't ready for general use. Even after training, I always got enough errors to discourage use. And I type relatively quickly, so it was just more effective for me to do it manually.

Don't waste your time looking for answers here (-1, Troll)

Anonymous Coward | more than 4 years ago | (#32972464)

And that is the crux of the biscuit...........No one here has a clue as to what needs to be done,here it is kids only, arrogant,useless wannabe nerds. Get a life,get a job and do sometihng that adds to the species or just simply STFU.

No. (1)

Alex Belits (437) | more than 4 years ago | (#32972480)

I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially.

No. Automated arbitrary speech recognition is an unsolved problem -- all voice recognition systems require speaker to make an effort to pronounce words clearly, or make the number of mistakes that take more effort to fix than to write manually.

It will make more sense to write a transcription assistance software -- an equivalent to the tape player with a foot pedal commonly used for this purpose, except with capability to play and repeat short sequences of words or phrases, speed adjustment, etc.

Re:No. (0)

Anonymous Coward | more than 4 years ago | (#32973142)

Please ignore. I accidentally moderated your post as redundant, when I meant insightful. I'm replying so that my moderation is removed.

XTrans from the LDC! (0)

Anonymous Coward | more than 4 years ago | (#32972530)

Try XTrans [upenn.edu] from the Linguistic Data Consortium. It's GPL and specifically designed for doing speech transcription. Ask nicely for support, please; the main developer is quite busy.

Got kids? (4, Insightful)

Kral_Blbec (1201285) | more than 4 years ago | (#32972554)

Pay them a buck per page and they learn some family history along the way. Problem solved.

Re:Got kids? (4, Interesting)

Luckyo (1726890) | more than 4 years ago | (#32972860)

This is one of the cases where journey matters as much if not more then destination :)

Google Voice? (1)

Facegarden (967477) | more than 4 years ago | (#32972560)

Google has been working on speech to text for years, and they've got Google Voice to where it transcribes your messages to text. Works great with the Android client, and they have a web page. But even with google's experience and money, its not very accurate. It might be better than most of what you'll find though, and its free.

You could probably rig up Google Voice to where each thing you want to transcribe gets recorded as a "message" to you.

That said, here's a voicemail I got recently:

"Hey Jeff, Nate what you can still haven't been able talk to you in. X-rite is and see if you've been found. If off seems like just. I don't know if the E Z the phone software. This is not available 4. Slash number. I wanna malfunction or give us a call back to you now."

So its not perfect... One funny thing is that my name isn't Jeff or Nate, and neither was the caller.
-Taylor

High quality recordings now, transcription later (2, Informative)

itamblyn (867415) | more than 4 years ago | (#32972584)

I think the most important thing to keep in mind for a project like this is that you should do everything you can to ensure a high quality recording. Don't worry about transcription at this point - just focus on getting content. When algorithms (and computers) have improved in 5-10 years time you can do the transcription. It might even be useful to record the sessions with a video camera. Maybe speech recognition tech of the future will use lipreading in addition to the approaches that are used now.

Foot pedals (0)

Anonymous Coward | more than 4 years ago | (#32972620)

If you are transcribing manually, you really want to consider using something with foot pedals, so you can control the playback with that instead of switching between typing and playback software all the time.
http://www.nch.com.au/scribe/pedals.html

Easy answer (1)

JiffyPop (318506) | more than 4 years ago | (#32972632)

Just make a call to your favorite terrorist-harboring nation, add in some carefully chosen phrases, and them do an FIA request for them.

Foot Pedal and Express Scribe best option (4, Informative)

Adattisi (1860686) | more than 4 years ago | (#32972634)

I've been a transcriptionist for over 5 years, and unless you want to have to retype most of it yourself anyway, don't offer pennies on a site like guru/vworker/elance. A decent transcriptionist is going to charge at least $45-50 per AUDIO hour (not hours it takes) if it's a good, clear recording & a single speaker. If there was a really great product out there, I'd be out of a job. If you want to do it on the cheap, get an inexpensive USB Infinity foot pedal (on ebay) as mentioned before & Express Scribe is a free download to playback & rewind the audio. Both are what I use. Good luck!

Wont Work (1)

EEPROMS (889169) | more than 4 years ago | (#32972702)

Were I work we have tons of recordings from engineering committees and we tried lots of free and commercial programs but at the end of the day due to the vagueness of the English language the best solution was to "hire a human". So thats what we did, we have found a few people in India who were happy to transcribe our recordings for a fraction of the cost of hiring someone to fix the stuff ups from the speech-text software (also good speech-text software costs a fortune and takes ages to train especially when most of the engineers sentences are full of acronyms). So save time and help those with less money and hire someone, not like we have a global shortage of people.

Your choices are basically humans or the Dragon (1)

mdecerbo (9857) | more than 4 years ago | (#32972752)

Though there are interesting speech recognition products for other applications [bbn.com] ; for this task Dragon and IBM ViaVoice, both sold by ScanSoft, are pretty much the only software choices until someone qualified gets an NSF grant to beef up Sphinx.

I can second the recommendation of the LDC's XTrans [upenn.edu] if you're going to do this yourself.

If you want someone else to do it, here are a lot of podcasters who want transcripts, and a bunch of transcription services have sprung up to address the market. They've already implemented a lot of the quality-control mechanisms you'd have to address in order to get good results from something like the Mechnical Turk.

The Wall Street Journal ran a side-by-side comparison [wsj.com] back in 2008 and recommended castingwords.com [castingwords.com] , but another provider may very well be better by now. Shop around.

Try speakwrite.com (0)

Anonymous Coward | more than 4 years ago | (#32972768)

Get it back in 3 hours

"Transcriber" is the tool you want (1)

harmonise (1484057) | more than 4 years ago | (#32972800)

Transcriber [sourceforge.net] is the tool that you are looking for. It plays the file and you type and annotate. It's in the Ubuntu repositories so I assume it's in Debian's as well.

Google Voice... (1)

RobertM1968 (951074) | more than 4 years ago | (#32972842)

I hear Google has a great tool for this that they use for Google Voice...

Or... transcribed...

I'm here googoo, hi a grape too fur this that day fuse far google boys...

Coding Horror article (2, Informative)

lulalala (1359891) | more than 4 years ago | (#32972858)

Coding Horror recently posted an article about the current voice recognition technology.

http://www.codinghorror.com/blog/2010/06/whatever-happened-to-voice-recognition.html [codinghorror.com]

There is a poem which got transcribed, and the title became like this:

"a poem by Mike Bliss --> a poem by like myth"

The rest of the poem is equally funny. So basically you better transcribe it manually.

Wrong question. (1)

BitZtream (692029) | more than 4 years ago | (#32972882)

Your question was phrased wrong.

Just ask for what you mean, you want free software not so much OSS. Its not like you're going to go editing and fixing bugs in the speech algorithm so the openness here really is just a guise to get something for free.

You'll find plenty of no-cost ways to transcribe, but OSS options fall short.

Reality of it is, you'll save yourself a lot of effort if you just type it yourself. It'll be faster and far more accurate.

google voice (1)

pixelite (20946) | more than 4 years ago | (#32972940)

Why don't you just use google voice to transcribe? Google Voice has a feature to transcribe your voice mails, not sure how long each message can be but maybe you can automate it somehow?

I wrote a little application for that... (1, Interesting)

Anonymous Coward | more than 4 years ago | (#32973012)

I looked into automatic transcription software too. I think the consensus is that none of it works well unless it is trained, and trying to "train" software with regular recordings of conversations is not likely to work.

I wrote my own little application so that I could type the text in myself. It works with WinAmp so its tied to windows (Sorry! Time constraints...) From my web page:

http://csclub.uwaterloo.ca/~jg3macka/GabbleFarb/index.html

What it is:

GabbleFarb is basically a glorified notepad application that works with WinAmp (a free audio and video player). A number of hotkey combinations exist to control WinAmp from inside GabbleFarb. As a transcriber, this allows you to easily pause, rewind, fast-forward and control volume levels without leaving the editor. Additionally, as a video or audio file is playing in WinAmp whenever the ENTER key is pressed GabbleFarb will begin the next line with a timestamp of the current playing time. Within the editor, you can then double-click on a line of text in your transcript and GabbleFarb will automatically tell WinAmp to start playback at that point in the file.

Accuracy (0)

Anonymous Coward | more than 4 years ago | (#32973172)

The decision of automatic vs manual depends on whats the accuracy you want. Automatic will can go upto 75% to 80%.The best way to use automatic transcription would be to train your PC's speech recognition, play the file with headphones, speak it out loud yourself. Again, there are a lot of contextual information which cannot be transcribed accurately by a computer. So you'll have to manually edit these files if you want to take it to 100%.

You can also manually transcribe it yourself. If you have typing speed around 80wpm then an hour of audio will take around 4 hours to do. Have a look at NCH ExpressScribe. Its a free play/stop software which is almost de-facto standard in the transcription industry.

You can also use various transcription services which are out there. A professional transcription service will charge you around $1 to $2 per audio hour. Freelancers will charge around half of that. But then with freelancers you cannot guarantee the quality.

Shameless Plug: We provide a transcription service for $0.75 per minute of audio. http://callgraph.biz

moD do3n (-1, Troll)

Anonymous Coward | more than 4 years ago | (#32973198)

A-AI (0)

Anonymous Coward | more than 4 years ago | (#32973204)

Artificial Artificial Intelligence. IOW, Farm it out to piece workers on the net for pennies on the Amazon Turk project.

State of Speech Reco (3, Informative)

poor_boi (548340) | more than 4 years ago | (#32973260)

It's been my job to work with speech recognition technology for the last 10 years. I've worked with speaker-independent grammar-based recognizers like Nuance Recognizer. I've worked with speaker-dependent training-based recognizers like Dragon Naturally Speaking. I've used open source recognizers like Sphinx. I've even dabbled with writing my own basic recognition engine. I can tell you with confidence: with the current state of commercial/open-source technology, you will not be able to get satisfactory results transcribing two speakers in the same recording. Accurate machine transcription requires training and single-speaker. I have heard people claim that speech recognition is a dead technology because it has stopped improving at appreciable speeds. While improvements have slowed down drastically, I do not believe speech recognition is dead by any means. We've really been making the same steady progress since the inception of speech recognition -- but previously we were riding the wave of geometric (sometimes exponential) growth in CPU clock rate. Now that the free lunch is gone, recognition algorithms need to be parallelized to once again ride improvements in CPU design.

Use Google Voice (1)

got2liv4him (966133) | more than 4 years ago | (#32973404)

record what they are saying into a voice mail on google voice...

Human transcription: Cheaper then you'd guess. (1)

spinkham (56603) | more than 4 years ago | (#32973412)

The only good transcription software still runs on wetware.

Luckily, humans are cheap and easily available.
Casting words is one of the cheapest ways get humans to transcribe your content.

http://castingwords.com/ [castingwords.com]

If you'd like to save a few bucks by cutting out the middleman, see an even cheaper way here:

http://waxy.org/2008/09/audio_transcription_with_mechanical_turk/ [waxy.org]

craiglist (1)

cynyr (703126) | more than 4 years ago | (#32973460)

post an ad on craigslist that you are paying $20/hour of recording to have it typed out. Pizza provided as well. ByoB. bet some college kid takes you up on it.

Looking at it the wrong way (0)

Anonymous Coward | more than 4 years ago | (#32973530)

Get the lazy bastards to type it out in the first place.
All that screwing around when you could just be typing it, I see doctors do it too...... blah..blah...blah... shut your trap.... here's a keyboard imbecile.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>