Open Source Transcription Software?

Become a fan of Slashdot on Facebook

Open Source Transcription Software? 221

Posted by kdawson on Tuesday July 20, 2010 @06:49PM from the what-he-said dept.

sshirley writes "I am beginning to do some interviews with family members and will do some audio journals for genealogy purposes. I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially. But does this exist in the open source world?"

This discussion has been archived. No new comments can be posted.

Open Source Transcription Software?

Load All Comments

Search 221 Comments Log In/Create an Account

Comments Filter:

CMU Sphinx (Score:3, Informative)

by Singularity42 ( 1658297 ) writes: on Tuesday July 20, 2010 @06:53PM (#32971950)

Looks active.

Share
twitter facebook
- Re:CMU Sphinx (Score:5, Informative)
  
  by Narksos ( 1111317 ) writes: on Tuesday July 20, 2010 @07:31PM (#32972384)
  
  What you want is dictation software. I just (last week) spent significant time looking in to this.
  
  For open source you have two main options: CMU Sphinx and Julius/Julian. Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that (the source for the CMU Sphinx demos show how to get input from a mic/wav file (if you've got something other than PCM you'll just need to convert it) and set up various engines.
  
  CMU Sphinx [sourceforge.net] appears to be mainly for research purposes. You can run it in a few different modes: one with a fixed grammar (for command systems, Gnome's voice control uses sphinx in this mode), one (what you'd be looking for) uses a weighted dictionary. I didn't train it to my voice (and you wont be able to train it for transcriptions) and I was getting fairly lousy recognition rates with my $20 Logitech USB Microphone. It might work better with a high quality headset, but I imagine you wont both be wearing one.
  
  Julius/Julian [sourceforge.jp] lacks a good acoustic model for English. VoxForge [voxforge.org] is working on one, but it isn't anywhere near complete.
  
  Here is a good article that sums up the current projects [eracc.com]
  
  Parent Share
  twitter facebook
  - Re:CMU Sphinx (Score:4, Insightful)
    
    by notthepainter ( 759494 ) writes: <oblique&alum,mit,edu> on Tuesday July 20, 2010 @10:05PM (#32973490) Homepage
    
    Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that
    
    Actually, it can be rather hard to do that. I was one of the founders of MacSpeech and there is a surprisingly large set of details you have to deal with, punctuation, capitalization, etc... Of course since you wouldn't be making a commercial product much of the gloss need not be coded but once you have the engine, the part that takes the audio source and converts it to text, you still have a large amount of work left over.
    
    Parent Share
    twitter facebook
  - Re:CMU Sphinx (Score:5, Informative)
    
    by Bacon Bits ( 926911 ) writes: on Wednesday July 21, 2010 @03:02AM (#32974706)
    
    Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that
    "... unless you're not a programmer."
    Seriously, since when is "program it yourself" a solution to "are there any open source software packages that do what I want?"? The answer you're looking for is "no". That's the correct answer.
    Here, here's a nice car analogy since we're on Slashdot: when you need a car do you buy a kit car, or do you buy one factory built? This is like telling someone who wants a car to drive to work that they should simply buy Chevy big block engine and build the rest from scratch. Just because I need a car doesn't mean I must be an automotive engineer and metal fabricator. Similarly, just because I need dictation software doesn't make me a software architect or a linguist. Directing this person to program their own software is not answering the question.
    Cripes. People wonder where the "open source is only free if your time has no value" line came from.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by slim ( 1652 ) writes:
      
      Both options are just back-ends, you'll have to write a front-end. However, it shouldn't be too hard to do that
      "... unless you're not a programmer."
      In any discussion about Open Source, it's appropriate to mentally substitute the verb "program" with the phrase "program or pay someone to program".
      Economically, the two are equivalent.
      OSS doesn't just give you the freedom to hack at code yourself. It also gives you the freedom to hack at it by proxy.
      - Re: (Score:2)
        
        by murdocj ( 543661 ) writes:
        
        In any discussion about Open Source, it's appropriate to mentally substitute the verb "program" with the phrase "program or pay someone to program".
        
        In any discussion of software, it's appropriate to mentally substitute the verb "program" with the phrase "program or pay someone to program". If you're willing to pay, you can always get what you want. "Open Source" has no bearing on that.
        What's interesting here is that people talk a lot about how Open Source == freedom, not "free as in beer". But I'd be wi
        
        Re: (Score:2)
        
        by marcello_dl ( 667940 ) writes:
        
        The right suggestion then ends with "bay".
        Look for used software on ebay.
        (got ya huh?)
        
        Re: (Score:2)
        
        by slim ( 1652 ) writes:
        
        What's interesting here is that people talk a lot about how Open Source == freedom, not "free as in beer". But I'd be willing to be most posters asking about "Open Source" solutions to problems are more concerned with the "no cost" than they are with "I can modify the source".
        Which is why I like to subtly correct them :)
    - Re: (Score:2)
      
      by SwedishPenguin ( 1035756 ) writes:
      
      If that question was posed on a website with a lot of people interested in cars, and interested in building cars, it's a perfectly valid response, just as the grandparent says one can build a front-end relatively easily on a website with a lot of people interested in programming. If the question was posed on a genealogy website, it obviously would not have been an appropriate response, but this is Slashdot...
    - Re: (Score:2)
      
      by orasio ( 188021 ) writes:
      
      Seriously, since when is "program it yourself" a solution to "are there any open source software packages that do what I want?"? The answer you're looking for is "no". That's the correct answer.
      Your answer is the response to "I want a ready to use dictation software, and I don't want to pay for it".
      If you find an open source backend, and payment is not your sole concern, you can get someone, maybe even the authors, to code the parts you need. You might even do it yourself, of course.
      The thing is that _you_ assumed that "open source" meant "free as in beer". Some of us didn't. For those of us, it was a good response.
    - Re:CMU Sphinx (Score:5, Insightful)
      
      by Crudely_Indecent ( 739699 ) writes: on Wednesday July 21, 2010 @09:28AM (#32976844) Journal
      
      "... unless you're not a programmer."
      I am a programmer, but we're all sometimes out of our element.
      I found need for modifications to an open source application a few years ago. Rather than spend my time reading the source code to understand how the application worked, I decided to contact the developer. A few emails and a couple of days later, the project developer made the modifications for me and $500 for himself. The world then gained additional functionality in the open source application - everyone wins.
      Some people forget, this is how many open source applications survive.
      Your analogy is outlandish! If someone wants to drive a car to work, they buy a car. If they want a shark fin on the roof, they go to a custom body shop. If they want a killer stereo, they go to a stereo shop. If they want it to be pink and yellow like yours, they go to a paint and body shop. If they can do these things on their own, they'll do it. The difference being that if the car was open source, doing these things wouldn't void the warranty.
      "Open-source is free only if your time has no value." - Jamie Zawinski
      I offer an alternative viewpoint:
      Open source is free if you truly understand freedom.
      I'm free to use the application. I'm free to modify it. I'm also free to recognize my limitations and pay someone else to do these things for me.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Insightful)
        
        by Crudely_Indecent ( 739699 ) writes:
        
        The commercial app does exist, and it's a per-use app that is controlled by a dongle and subscription (hint, more than $500 - plus usage).
        Sticking it to the man has nothing to do with it, unless by "it" you mean money and by "the man" you mean my pocket.
        Of course, any commercial developer will gladly make a custom app for $, but I guarantee that it will be more than $500. The developer did have plans to add the functionality...eventually. My $500 bought made it happen right now.
        It was certainly silly of m
    - Re: (Score:3, Insightful)
      
      by Bill, Shooter of Bul ( 629286 ) writes:
      
      Blechkt. That's how I feel about your post. This is a site for nerds. Nerds are often adept at doing nerdy things. Like writing software.
      Now, if you're mom asked you. Then yes, a reply of "You only need to write a front end to this speech engine" is indeed inappropriate.
      Your post, and the replies to it, really reflect more on how you view the general slashdot audience, then anything else.
  - Re: (Score:2)
    
    by Tsu Dho Nimh ( 663417 ) writes:
    
    For open source you have two main options: CMU Sphinx and Julius/Julian. Both options are just back-ends, you'll have to write a front-end.
    
    So, the answer is "no". There is no OS software that is ready for the OP to install and use.
  - Re: (Score:2)
    
    by tehcyder ( 746570 ) writes:
    
    For open source you have two main options: CMU Sphinx and Julius/Julian. Both options are just back-ends, you'll have to write a front-end.
    
    That answer is factually correct, whilst being entirely unhelpful. You must be an actuary in real life.
- Re:On the other hand... (Score:2, Interesting)
  
  by vrmlguy ( 120854 ) writes:
  
  I just slice everything up into segments of 60 seconds and let Google Voice transcribe it for me. Sure, some nay-sayers might point out that it's slower that transcribing it all manually, but they don't get that I'm getting Google to do the work for me!
  - Re: (Score:2)
    
    by quickOnTheUptake ( 1450889 ) writes:
    
    Informative?
    Attention slashdotters, There is at least one retard on the loose. He may be calling himself and playing tapes into the phone. If you encounter him do not engage him as he is armed with modpoints and may use them erratically.
  - Re:On the other hand... (Score:5, Funny)
    
    by Verteiron ( 224042 ) writes: on Tuesday July 20, 2010 @11:38PM (#32973960) Homepage
    
    I'm sure you roar get ding fan plastic results from goo gull boys, two eye find it variably hell full.
    
    Parent Share
    twitter facebook
- Re: (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  Sphinx is what many companies use to get started with, but it's far too raw to be useful by itself. You need to update the HMM back-end extensively... and train it. Even still, your success rate is only 80%... meaning: 1 in 5 words, if spoken slowly, will still be wrong.
- Re: (Score:3, Interesting)
  
  by inkyblue2 ( 1117473 ) writes:
  
  Sphinx by itself is a terrible answer to this problem, unfortunately. The code is free, but good luck finding an appropriate model. Worse, you'll need to train a speaker-dependent model to get any usable results, and this is a VERY non-trivial task with Sphinx tools in the state that they are. I spent several years getting paid to adapt Sphinx for commercial purposes and while it's great for some things, I can say with confidence that it is not the tool you're looking for.
  You know what works? Dragon. Hate t
Dear aunt, (Score:5, Insightful)

by Anonymous Coward writes: on Tuesday July 20, 2010 @06:53PM (#32971954)

let's set so double the killer delete select all.
Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.

Share
twitter facebook
- Re:Dear aunt, (Score:5, Insightful)
  
  by Kenoli ( 934612 ) writes: on Tuesday July 20, 2010 @07:07PM (#32972116)
  
  A program capable of "making informed guesses based on context" seems perfectly plausible, though that's not part of speech recognition per se.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by conchubhair ( 1453303 ) writes:
    
    The problem you are describing (continuous speech recognition) is not solved yet. Even the best state of the art technology is not going to be perfect, and having two speakers will make it even less useful. If you really need the stuff transcribed, you can pay for online services to transcribe it (if they offer really good quality transcription, they are most likely using humans) or you can transcribe it yourself (you can buy software to help speed up the transcription process - including a foot pedal to pa
    - Re: (Score:2, Informative)
      
      by Flyerman ( 1728812 ) writes:
      
      The parent's link is exactly what I set up on a client's machine. They purchased the headset and pedals but the software itself was free and worked wonderfully.
  - Re: (Score:2)
    
    by k.a.f. ( 168896 ) writes:
    
    A program capable of "making informed guesses based on context" seems perfectly plausible, though that's not part of speech recognition per se.
    If you believe that, you don't know much about speech recognition.
    Seriously, the language model in modern dictation systems is THE most important part. The computer gets much more relevant information from a-priori probabilities of words and sounds than from recognizing them directly, because most of the time the sounds that our brain thinks it hears are objectively not there at all. Read up on NLP some time; it is a totally fascinating (if somewhat depressing) field of research.
- Re: (Score:2)
  
  by icebraining ( 1313345 ) writes:
  
  And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.
  Never? Not only it's possible, as there are already some papers on prototypes of grammar-switching context-based speech recognition engines.
- But Windows Speech Recognition... (Score:5, Informative)
  
  by Monkeedude1212 ( 1560403 ) writes: on Tuesday July 20, 2010 @07:11PM (#32972168) Journal
  
  Most Windows Vista or Win7 machines come with a built in transcribing feature, that you can enable in the control panel (Win7, under ease of access, Speech recognition).
  However - the only way it works properly is if you train it to understand you personally. You load your profile, and it'll run you through a whole bunch of test sentences. The FULL test takes you about 20 minutes I think (It's been a while since I've used it) - and actually works quite well. There is a cut off point at about 2 and a half minutes if you want to stop and try it out. It actually makes it keyboard and mouseless if you want. When you open a browser it highlights everything on the web page thats clickable and assigns it a number, and you simply say "Click 7" and it hits the reply button for you. Then you talk when the textbox has focus and it'll transcribe every word you say.
  I did this for my girlfriend's paper once, I read it aloud (you have to mention things like comma, end paragraph, etc) and put it into a Word document. Out of a 15 page single spaced Essay - it got 3 sentences wrong - and that's only because I was mentioning some of the more Obscure greek names (she's a history major). It managed to get full sentences regarding Octavia and her fondness of libraries without error, which I thought was odd since thats not a name you hear every day.
  Anyways - if he wants to do this, he should record the test phrases (there will be a lot though) and have each of his interviewees read the test sentences so he can then relay those through the computer and train the computer for each person.
  All in all - he may still run across a few errors, but its not nearly as bad as say Google Voice Mail, which tries to figure out what you're saying without having any previous knowledge on how that person speaks. Windows Speech Recognition is something that will handle what he's after though.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Monkeedude1212 ( 1560403 ) writes:
    
    Forgot to mention: I'm not entirely sure he needed an "Open Source" Solution as much as he needed a "cost effective" solution though - he makes no mention of altering any code. So I mean, Windows Speech Recognition is not exactly Open Source.
    - Re: (Score:2)
      
      by markdavis ( 642305 ) writes:
      
      >So I mean, Windows Speech Recognition is not exactly Open Source.
      It's not exactly multi-platform either. He might be using Linux, for example (like so many of us do). Really, the original post left off a lot of potentially useful (narrowing) info.
  - Re: (Score:2)
    
    by unix1 ( 1667411 ) writes:
    
    All in all - he may still run across a few errors, but its not nearly as bad as say Google Voice Mail, which tries to figure out what you're saying without having any previous knowledge on how that person speaks.
    Each Google voicemail transcription has an option for a user to mark whether the transcription was accurate or not. I wouldn't be surprised if they were tying that into the caller-specific profile. So, if you leave a message for friend A and he marks the transcription good, that data may be used when you call not just person A, but everyone else in the future. In fact, it wouldn't make any sense otherwise. Now, how much actual "learning" algorithms they have on the back side, I cannot tell you.
  - Re: (Score:2)
    
    by LordLucless ( 582312 ) writes:
    
    It probably helps that Greek is transliterated when rendered with the English alphabet - which means most of the funky names you were saying were spelled phonetically, and thus easy for a recognition engine to pick up - even easier than a lot of regular English words.
  - - Re: (Score:2)
      
      by siriuskase ( 679431 ) writes:
      
      That's my experience. I don't want to talk to my computer, I want to talk to a voice recorder, then transfer the files to the computer for transcription. For someone who types faster than she talks, the nuance/dragon type stuff is useless, but I'd love to create voice files while driving or have my answering machine files transcribed, cause I can read faster than i can listen. Who cares if a few of the words are misspelled, I could still listen to the sound files and clean up the text. kinda like OCR.
- Re: (Score:2, Insightful)
  
  by ThomConspicuous ( 1004135 ) writes:
  
  It's already being done in medical dictations that are also recorded and double checked by Transcriptionists. Speeds up work flow immensely even with the human verification in place.
  
  I even witnessed an East Indian doctor with a heavy accent dictate normally and have the software pick up everything stated. He was pleasantly surprised.
  
  It works.
- Re: (Score:2)
  
  by xSauronx ( 608805 ) writes:
  
  i say set up google voice, dial the gv number and do your questions as voicemail into a speakerphone
  youll get a transcription in your email. do a question at a time. problem solved!
- Re: (Score:3, Insightful)
  
  by theheadlessrabbit ( 1022587 ) writes:
  
  let's set so double the killer delete select all.
  Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.
  ...a computer program cannot do yet
- Re:Dear aunt, (Score:5, Informative)
  
  by painandgreed ( 692585 ) writes: on Tuesday July 20, 2010 @08:04PM (#32972652)
  
  Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.
  Funny, considering my job is training doctors to use voice recognition to do all their reporting. Actually, it works fairly well. I also don't mean dictating something that goes to transcriptionists. The doctors dictate the report. The dictation is transcribed into text. They review it and sign off. We got rid of all our transcriptionists years ago. The time for a report to get done went from 24 hours with transcriptionists to 24 minutes with voice recognition. The amount of errors was cut in half. The doctor's work load was also lessened as they could check the final version while still dealing with the data rather than having to go back and review everything all over again a day or two later. Speech recognition was a problem seven years ago, but hardly at all in the last five or so. Yes, the have to go over their dictations and occasionally make some minor corrections. There's always background noise to worry about and some people's accents are hard even for another person to get through, but for things that require quick turn around and need to be verified by the person who is doing it, voice recognition already is the gold standard.
  PS several of the doctors like it so well they bought Dragon (pretty much everybody but Phillips use Dragon for their speech engine) for home and use it there for all their email and other writing.
  
  Parent Share
  twitter facebook
  - Re:Dear aunt, (Score:5, Interesting)
    
    by BitZtream ( 692029 ) writes: on Tuesday July 20, 2010 @10:16PM (#32973550)
    
    Ironically, I have a family member he runs a business doing transcription for doctors ... because every time the try voice recognition software they get pissed off and go back to real people.
    Being a fan of Dragon Dictate myself, I know its not that great and I know it has a fit when you start throwing accents at it, training or not.
    I call bullshit on your claims of using Dragon for everything.
    
    Parent Share
    twitter facebook
    - Re: (Score:2, Interesting)
      
      by Mr. Pibb ( 26775 ) writes:
      
      I call bullshit on your bullshit.
      I do occasional work for a Worker's Comp doc who has been working with Dragon for over 10 years. He swears by it.
      The work is an hour-long interview, and hours of paperwork. He dictates the report into a MiniDisc recorder while reviewing his notes and then plays the recording back into the computer, watching for errors (few) and reviewing. I've also set up several other docs in the same field with Dragon, and they're quite pleased with it as well.
      At first, he had to buy the l
      - Re: (Score:2)
        
        by slim ( 1652 ) writes:
        
        Could it be that people with certain accents have success with Dragon, while others do not?
        I've found with some products (and people!) -- low end products like Nintendo Brain Training and Google that my instinct is to try and speak more clearly. That, to me, is to go closer to British RP.
        What actually works is to put on a mock American accent.
        See also, ordering a Bud in Texas. You have to ask for a "bird" and then they understand ;)
- Re: (Score:3, Informative)
  
  by Bluesman ( 104513 ) writes:
  
  by making informed guesses based on context, which a computer program cannot do.
  The Perl interpreter can.
- Re: (Score:3, Interesting)
  
  by binarybum ( 468664 ) writes:
  
  wow, shame on the anonymous troll that posted this and the moderators that must have been teleported from the early 90s. The high-end transcription packages are truly incredible. Yes, you need to spend some time training them to your speech patterns and accent, and yes it makes a big difference if you use a quality microphone (not the one that's built into your laptop or iphone) at a fixed distance. With a decent setup transcription software can be really impressive at high speeds and with complicated vo
  - Re: (Score:3, Interesting)
    
    by micheas ( 231635 ) writes:
    
    I can see medical transcriptions being the best point of transcription software.
    The vocabulary is largely devoid of slang.
    You have long specialized lexicons that are similar to very few other words.
    The vocabulary is probably fairly small as most doctors have a fairly specialized practice, so internists don't deal with the same areas as podiatrists, reducing the words that are used.
    The repetition is probably fairly high, allowing for training to be more effective than speech on random topics.
    In conclusion, f
- Re: (Score:2)
  
  by SwedishPenguin ( 1035756 ) writes:
  
  Seriously, transcribe it manually... automatic speech recognition just doesn't work. And can never work, because much of the time the only reason humans can understand each other is by making informed guesses based on context, which a computer program cannot do.
  Anonymous coward, meet statistics...
- MOD PARENT DOWN. (Score:2)
  
  by Karganeth ( 1017580 ) writes:
  
  Seriously, transcribe it manually... automatic speech recognition just doesn't work.
  That view is only ever held by people who have not used speech to text software in a very long time. Speech recognition software today works INCREDIBLY well. I would say that it was over 95% accurate (and I have a very cheap microphone). Here's a video of it in action http://www.youtube.com/watch?v=bsohqUgjqK0&feature=related [youtube.com] It frustrates me that people think that speech recognition hasn't progressed since they last used it and therefore their opinion of it from when they used it years ago is still v
- - Re: (Score:3, Informative)
    
    by fuzzyfuzzyfungus ( 1223518 ) writes:
    
    Unless things have improved substantially since Dragon NaturallySpeaking 10, I'd be more inclined to describe the performance as "surprisingly adequate job of it, with training, and offers a vaguely cellphone-esque interface for choosing the correct word when it fucks up".
    
    It isn't comedically awful; and it likely beats typing with your stumps, or your eyelids, or whatever; but "pretty good" is being very generous.
    
    (Again, unless things have improved markedly since then) the software works best when u
    - Re: (Score:2)
      
      by cgenman ( 325138 ) writes:
      
      There is a free iPhone dragon client, which sends the audio back to their servers for processing. There isn't any training, but there probably wouldn't be training on a family member's old tapes either.
      It's possible that Dragon might work for their needs, or at least be much easier to get equally bad data back as other solutions. Try the iPhone client and see.
      - Re: (Score:2)
        
        by mwvdlee ( 775178 ) writes:
        
        I think a PC application is cheaper than an iPhone + subscription + app.
    - Re: (Score:2)
      
      by account_deleted ( 4530225 ) writes:
      
      Comment removed based on user account deletion
      - Re: (Score:2)
        
        by bami ( 1376931 ) writes:
        
        Dear aunt, let's set so double the killer delete select all.
        I'm afraid the voice recognition in 7 is still rubbish, you need to train it first with some stupid wizard, and then it only sorta works on commands.
        Transcribing is still horrible and I only get some reasonable results if I speak to it in a full-on british accent. I'm not even FROM the UK!
- - - Re: (Score:2)
      
      by kagaku ( 774787 ) writes:
      
      When you consider the quality of audio input it receives I think it does a fairly decent job.
- - Re: (Score:3, Interesting)
    
    by msclrhd ( 1211086 ) writes:
    
    Your post highlights a key difference between written and spoken words -- we tend to contract words ("have a" to "hav.uh") and will flow one word into another ("said John" the d at the end of said and the d in the dZ sound merge, so the d at the end of said is dropped -- "sE dZ0n").
    Some people drop certain letters at the beginning and end of words -- "'e said 'what 'ave you been doin' today?'". This also makes it more complicated to transcribe. Not to mention regional dialect variations and strong accents.
    T
Sphinx (Score:5, Informative)

by SaXisT4LiF ( 120908 ) writes: on Tuesday July 20, 2010 @06:54PM (#32971964)

Carnegie Mellon has an open source speech recognition project you might want to look into. Sphinx [cmu.edu]

Share
twitter facebook
Unfortunately... (Score:3, Interesting)

by dmneoblade ( 848781 ) writes: on Tuesday July 20, 2010 @06:55PM (#32971966)

I spent several month searching for something like this. Open-source voice recognition is in really infant stages, and there does not seem to be much interested in improving the few things we have.

Share
twitter facebook
- Re: (Score:2)
  
  by Bigjeff5 ( 1143585 ) writes:
  
  Open-source voice recognition is in really infant stages
  It's a very old infant, too. :/
  Your best bet for text to speech is to use Google's TTS services - they are impressively accurate (though still nowhere near perfect).
  If it's going to be cutting out a lot of time, it may be worth buying a commercial product.
- Re: (Score:2)
  
  by dargaud ( 518470 ) writes:
  
  I spent several month searching for something like this. Open-source voice recognition is in really infant stages, and there does not seem to be much interested in improving the few things we have.
  Funny, I had the exact same thought 25 years ago when playing with speech recognition software on the Apple ][... I don't know if things have improved much since, as I use 3 languages daily, there's just no software that can handle my accents and language changes.
I've wondered about this too (Score:2, Interesting)

by itamblyn ( 867415 ) writes:

It seems like there should be some way to "hack" the audio transcription that google offers through google voice or youtube. Unfortunately I haven't found a way to upload a file. With youtube, if you make a fake movie, it gives an error that it can't be transcribed. Getting google voice to work would require some sort of phone interface I suppose...
- Re:I've wondered about this too (Score:4, Informative)
  
  by Enuratique ( 993250 ) writes: on Tuesday July 20, 2010 @07:16PM (#32972214)
  
  Google relies on Twilio [twilio.com] for their audio transcription.
  
  Parent Share
  twitter facebook
Best Idea (Score:3, Informative)

by Anonymous Coward writes: on Tuesday July 20, 2010 @06:55PM (#32971980)

just upload it to youtube, its genius google transcription technology will make everything sense out of it.

Share
twitter facebook
Having delt with this... (Score:2)

by Skuld-Chan ( 302449 ) writes:

I'm interested in open version of a transcription app (I run a lab with a lot of this software/equipment) but this is a very vertical market - up until recently there wasn't any standard interface for the foot pedal (newer ones are hid usb devices now).
I had to throw away a bunch of sony serial devices because they only worked with one app I can't make work on newer versions of Windows.
XTrans (Score:2, Interesting)

by ceraphis ( 1611217 ) writes:

Why don't you give XTrans a shot: XTrans [upenn.edu]
- Re: (Score:2)
  
  by __aasqbs9791 ( 1402899 ) writes:
  
  Being interested in this for meetings after checking out the user manual for xtrans it sounded really interesting, but I can't run it on my system and I can't find any useful help on their pages about this error on my Ubuntu system:
  /dev/dsp: Device or resource busy terminate called after throwing an instance of 'QWave2::AudioDeviceError' Aborted
  I'm not even sure where to look next since a quick google search didn't turn up a useful fix.
  - Re:XTrans (Score:4, Informative)
    
    by Kev Vance ( 833 ) writes: <kvance.kvance@com> on Tuesday July 20, 2010 @08:49PM (#32973038) Homepage
    
    Ubuntu uses PulseAudio on the ALSA audio subsystem, but that error message indicates XTrans is trying to use the OSS audio subsystem instead. To work around this, try using the Pulse OSS wrapper or temporarily disable Pulse. From the commandline, "padsp xtrans" or "pasuspender xtrans".
    
    Parent Share
    twitter facebook
No (Score:2)

by waldoj ( 8229 ) writes:

I've put a bunch of time into this for a project of my own. The short answer is, no, I have found no such program. I've experimented with a few older programs, but they're useless. Sorry.
USB foot control (Score:2, Informative)

by wguy00 ( 985922 ) writes:

Buy a USB foot control (check out infinity or fortherecord), and download the free player from fortherecord.com. You can stop, start, rewind and fast-forward without having to take your eyes off the screen or leave your word processing app.
- Re: (Score:2)
  
  by IANAAC ( 692242 ) writes:
  
  I can't tell from their web page, but does their software allow you to speed up/slow down the recording without any distortion of speech?
  That's one reason I like Express Scribe.
Open source no. (Score:2)

by jnnnnn ( 1079877 ) writes:
Here's a list [wikipedia.org]. In my experience, only Dragon is worth trying, with the following caveats:
- It helps to spend ten minutes training it for each voice
- It will still only get 99% accuracy
- You need a high quality (low noise) recording with a good microphone
On the plus side, correction is easy -- read the document, and select words that look wrong to hear what they sounded like.
Most of the other programs are aimed at very small vocabularies (i.e. 100 words) for accessibility applications (controlling a computer).
- Re: (Score:2)
  
  by markdavis ( 642305 ) writes:
  
  Dragon is not open source. It is not even multi-platform.
  - Re: (Score:2)
    
    by ducomputergeek ( 595742 ) writes:
    
    But the link does show opensource solutions on the list. The OP is just stating that in his experience, the only solution he has found that works is Dragon and relating his experience with Dragon.
I looked, but still do it manually (Score:5, Informative)

by ciaran_o_riordan ( 662132 ) writes: on Tuesday July 20, 2010 @07:25PM (#32972316) Homepage

I've worked on loooads of transcripts. I did most of these:
* http://wiki.fsfe.org/Transcripts [fsfe.org]
The best technique I've found is to have mplayer play the audio at 60% normal speed and have a text editor (emacs is my preference) in another window, flick between them with alt-TAB and hit Space to start and pause mplayer.

Share
twitter facebook
- the command line (Score:5, Informative)
  
  by ciaran_o_riordan ( 662132 ) writes: on Tuesday July 20, 2010 @07:37PM (#32972446) Homepage
  
  To play an audio file at 60% normal speed:
  mplayer -af scaletempo=scale=0.6 the_file.ogg
  And then to check the transcript, change the 0.6 to 1.5 (or 2.0 for someone like Richard Stallman who speaks slowly and clearly).
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by nbauman ( 624611 ) writes:
    
    Got any tricks to get it to back-pedal 3 seconds?
- Re: (Score:3, Informative)
  
  by mutube ( 981006 ) writes:
  
  I'd agree. I did some part-time work transcribing audio a while back for extra pennies. One thing I would add is that instead of using Alt-Tab to switch applications and then hitting space to start/stop I found it was less frustrating to set up global keys for the purpose (I was using KDE at the time, I expect most desktops offer this).
  I assigned F12 to skip back 5 seconds and F9 to pause/restart. Using those (esp F12) it was relatively easy to keep up to speed with what was being said without switching awa
- Re: (Score:3, Informative)
  
  by nbauman ( 624611 ) writes:
  
  I've done loads of transcripts too.
  The best software I found was the Olympus DSS Player 2002, which came bundled with the expensive Olympus digital recorder (but the cheap ones had a bare-bones software). It was like the old mechanical tape transcribing machines, except much better, with adjustable back pedal, 50% slow speed, 200% fast speed, fast forward, fast back, etc. Newest version is probably better.
  Problem was it was optimized for the Olympus proprietary *.DSS format, although you could use *.WAV wit
If you don't find anything... (Score:3, Interesting)

by afabbro ( 33948 ) writes: on Tuesday July 20, 2010 @07:31PM (#32972380) Homepage

...you could always use RentACoder (er, Vworker.com now) and hire someone for pennies to do it.

Share
twitter facebook
Transcription (Score:2)

by ddillman ( 267710 ) writes:

I wish you luck in your quest as I'm also working on genealogy and would like to be able to do this as well. I'd be interested in hearing if you find something that works acceptably well for this purpose. In my experience (IBM Via Voice from OS/2 v.3 days to Dragon Naturally Speaking 10) the state of the art just isn't ready for general use. Even after training, I always got enough errors to discourage use. And I type relatively quickly, so it was just more effective for me to do it manually.
No. (Score:2)

by Alex Belits ( 437 ) * writes:

I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially.
No. Automated arbitrary speech recognition is an unsolved problem -- all voice recognition systems require speaker to make an effort to pronounce words clearly, or make the number of mistakes that take more effort to fix than to write manually.
It will make more sense to write a transcription assistance software -- an equivalent to the tape player with a foot pedal commonly used for this purpose, except with capability to play and repeat short sequences of words or phrases, speed adjustment, etc.
Got kids? (Score:5, Insightful)

by Kral_Blbec ( 1201285 ) writes: on Tuesday July 20, 2010 @07:52PM (#32972554)

Pay them a buck per page and they learn some family history along the way. Problem solved.

Share
twitter facebook
- Re:Got kids? (Score:5, Interesting)
  
  by Luckyo ( 1726890 ) writes: on Tuesday July 20, 2010 @08:27PM (#32972860)
  
  This is one of the cases where journey matters as much if not more then destination :)
  
  Parent Share
  twitter facebook
- Re: (Score:3, Funny)
  
  by tehcyder ( 746570 ) writes:
  
  Pay them a buck per page and they learn some family history along the way. Problem solved.
  
  Mummy, why does aunt Bess call grandma a "syphilitic rum-and-cock-addled whore"?
  Daddy, why was great grandpa Ben "shot at dawn for cowardice in the face of the enemy"?
  Mummy and daddy, how come I was born only four months after you married?
Google Voice? (Score:2)

by Facegarden ( 967477 ) writes:

Google has been working on speech to text for years, and they've got Google Voice to where it transcribes your messages to text. Works great with the Android client, and they have a web page. But even with google's experience and money, its not very accurate. It might be better than most of what you'll find though, and its free.
You could probably rig up Google Voice to where each thing you want to transcribe gets recorded as a "message" to you.
That said, here's a voicemail I got recently:
"Hey Jeff, Nate wha
High quality recordings now, transcription later (Score:2, Informative)

by itamblyn ( 867415 ) writes:

I think the most important thing to keep in mind for a project like this is that you should do everything you can to ensure a high quality recording. Don't worry about transcription at this point - just focus on getting content. When algorithms (and computers) have improved in 5-10 years time you can do the transcription. It might even be useful to record the sessions with a video camera. Maybe speech recognition tech of the future will use lipreading in addition to the approaches that are used now.
Easy answer (Score:2)

by JiffyPop ( 318506 ) writes:

Just make a call to your favorite terrorist-harboring nation, add in some carefully chosen phrases, and them do an FIA request for them.
Foot Pedal and Express Scribe best option (Score:4, Informative)

by Adattisi ( 1860686 ) writes: on Tuesday July 20, 2010 @08:02PM (#32972634)

I've been a transcriptionist for over 5 years, and unless you want to have to retype most of it yourself anyway, don't offer pennies on a site like guru/vworker/elance. A decent transcriptionist is going to charge at least $45-50 per AUDIO hour (not hours it takes) if it's a good, clear recording & a single speaker. If there was a really great product out there, I'd be out of a job. If you want to do it on the cheap, get an inexpensive USB Infinity foot pedal (on ebay) as mentioned before & Express Scribe is a free download to playback & rewind the audio. Both are what I use. Good luck!

Share
twitter facebook
Wont Work (Score:2)

by EEPROMS ( 889169 ) writes:

Were I work we have tons of recordings from engineering committees and we tried lots of free and commercial programs but at the end of the day due to the vagueness of the English language the best solution was to "hire a human". So thats what we did, we have found a few people in India who were happy to transcribe our recordings for a fraction of the cost of hiring someone to fix the stuff ups from the speech-text software (also good speech-text software costs a fortune and takes ages to train especially wh
Your choices are basically humans or the Dragon (Score:2)

by mdecerbo ( 9857 ) writes:

Though there are interesting speech recognition products for other applications [bbn.com] ; for this task Dragon and IBM ViaVoice, both sold by ScanSoft, are pretty much the only software choices until someone qualified gets an NSF grant to beef up Sphinx.

I can second the recommendation of the LDC's XTrans [upenn.edu] if you're going to do this yourself.

If you want someone else to do it, here are a lot of podcasters who want transcripts, and a bunch of transcription services have sprung up to address the market. They've al
"Transcriber" is the tool you want (Score:2)

by harmonise ( 1484057 ) writes:

Transcriber [sourceforge.net] is the tool that you are looking for. It plays the file and you type and annotate. It's in the Ubuntu repositories so I assume it's in Debian's as well.
Google Voice... (Score:2)

by RobertM1968 ( 951074 ) writes:

I hear Google has a great tool for this that they use for Google Voice...
Or... transcribed...
I'm here googoo, hi a grape too fur this that day fuse far google boys...
Coding Horror article (Score:2, Informative)

by lulalala ( 1359891 ) writes:

Coding Horror recently posted an article about the current voice recognition technology.
http://www.codinghorror.com/blog/2010/06/whatever-happened-to-voice-recognition.html [codinghorror.com]
There is a poem which got transcribed, and the title became like this:
"a poem by Mike Bliss --> a poem by like myth"
The rest of the poem is equally funny. So basically you better transcribe it manually.
- Re: (Score:2)
  
  by blue trane ( 110704 ) writes:
  
  "How to wreck a nice beach" http://www.stopsmilingbooks.com/how-to-wreck-a-nice-beach.php [stopsmilingbooks.com]
Wrong question. (Score:2)

by BitZtream ( 692029 ) writes:

Your question was phrased wrong.
Just ask for what you mean, you want free software not so much OSS. Its not like you're going to go editing and fixing bugs in the speech algorithm so the openness here really is just a guise to get something for free.
You'll find plenty of no-cost ways to transcribe, but OSS options fall short.
Reality of it is, you'll save yourself a lot of effort if you just type it yourself. It'll be faster and far more accurate.
State of Speech Reco (Score:4, Informative)

by poor_boi ( 548340 ) writes: on Tuesday July 20, 2010 @09:25PM (#32973260)

It's been my job to work with speech recognition technology for the last 10 years. I've worked with speaker-independent grammar-based recognizers like Nuance Recognizer. I've worked with speaker-dependent training-based recognizers like Dragon Naturally Speaking. I've used open source recognizers like Sphinx. I've even dabbled with writing my own basic recognition engine. I can tell you with confidence: with the current state of commercial/open-source technology, you will not be able to get satisfactory results transcribing two speakers in the same recording. Accurate machine transcription requires training and single-speaker. I have heard people claim that speech recognition is a dead technology because it has stopped improving at appreciable speeds. While improvements have slowed down drastically, I do not believe speech recognition is dead by any means. We've really been making the same steady progress since the inception of speech recognition -- but previously we were riding the wave of geometric (sometimes exponential) growth in CPU clock rate. Now that the free lunch is gone, recognition algorithms need to be parallelized to once again ride improvements in CPU design.

Share
twitter facebook
- Re: (Score:2)
  
  by TheTurtlesMoves ( 1442727 ) writes:
  
  How does Sphinx stack up to the rest?
Human transcription: Cheaper then you'd guess. (Score:2)

by spinkham ( 56603 ) writes:

The only good transcription software still runs on wetware.
Luckily, humans are cheap and easily available.
Casting words is one of the cheapest ways get humans to transcribe your content.
http://castingwords.com/ [castingwords.com]
If you'd like to save a few bucks by cutting out the middleman, see an even cheaper way here:
http://waxy.org/2008/09/audio_transcription_with_mechanical_turk/ [waxy.org]
craiglist (Score:2)

by cynyr ( 703126 ) writes:

post an ad on craigslist that you are paying $20/hour of recording to have it typed out. Pizza provided as well. ByoB. bet some college kid takes you up on it.
Doing it yourself... (Score:3, Interesting)

by Cruciform ( 42896 ) writes: on Tuesday July 20, 2010 @10:34PM (#32973642) Homepage

When I did some medical transcription a couple of years ago it was up to me to do it myself, and I didn't find anything open source at the time.
So I loaded up Amarok, configured global hotkeys to pause and jump forward and backward in the audio file in five second gap, and then loaded up a word processor.
Sure, it's not automatic, but it helped me get the job done.
It took me 3 to 4 hours to transcribe each spoken hour of a group of strangers. When the subjects have familiar speech patterns or it's an individual I found progress was much faster.

Share
twitter facebook
PRAAT (Score:2)

by jpkunst ( 612360 ) writes:

Transcribe manually, using a transcription program like PRAAT [praat.org].
Transana (Score:3, Interesting)

by paugq ( 443696 ) writes: <pgquiles@@@elpauer...org> on Wednesday July 21, 2010 @04:47AM (#32975046) Homepage

It's not what you are asking for, but it sure will help you: Transana [transana.org]

Share
twitter facebook
vi (Score:2, Funny)

by xmorg ( 718633 ) writes:

open up vi, press i, (or a), and press play on the audio device.
Type out whatever you hear.
Problem solved. :wq
Trick: re-speak it yourself (Score:2, Interesting)

by oergiR ( 992541 ) writes:

I'm doing my PhD on speech recognition. I think (and hope!) it's neither dead nor fully developed. Currently, changes of environment screw speech recognisers up. Different speakers, background noise... A trick that I heard has been used for subtitling television broadcasts is to have someone re-speak the words (which is not that hard). You could play the audio recordings on your headphones while repeating them into a microphone. If you're in a quiet room and the recogniser is trained on your voice, that ma
- Re: (Score:2)
  
  by ddillman ( 267710 ) writes:
  
  I'm sure the resulting high-quality audio signal will help Google Voice do an even better job than usual...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

CMU Sphinx (Score:3, Informative)

Re:CMU Sphinx (Score:5, Informative)

Re:CMU Sphinx (Score:4, Insightful)

Re:CMU Sphinx (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:CMU Sphinx (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:On the other hand... (Score:2, Interesting)

Re: (Score:2)

Re:On the other hand... (Score:5, Funny)

Re: (Score:2, Informative)

Re: (Score:3, Interesting)

Dear aunt, (Score:5, Insightful)

Re:Dear aunt, (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

But Windows Speech Recognition... (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:Dear aunt, (Score:5, Informative)

Re:Dear aunt, (Score:5, Interesting)

Re: (Score:2, Interesting)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:2)

MOD PARENT DOWN. (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Sphinx (Score:5, Informative)

Unfortunately... (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

I've wondered about this too (Score:2, Interesting)

Re:I've wondered about this too (Score:4, Informative)

Best Idea (Score:3, Informative)

Having delt with this... (Score:2)

XTrans (Score:2, Interesting)

Re: (Score:2)

Re:XTrans (Score:4, Informative)

No (Score:2)

USB foot control (Score:2, Informative)

Re: (Score:2)

Open source no. (Score:2)

Re: (Score:2)

Re: (Score:2)

I looked, but still do it manually (Score:5, Informative)

the command line (Score:5, Informative)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

If you don't find anything... (Score:3, Interesting)

Transcription (Score:2)

No. (Score:2)

Got kids? (Score:5, Insightful)

Re:Got kids? (Score:5, Interesting)

Re: (Score:3, Funny)

Google Voice? (Score:2)