×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

YouTube Makes Captioning Available To All

timothy posted more than 4 years ago | from the nation-of-lip-readers-knit-their-eyebrows dept.

Youtube 102

adeelarshad82 writes "Google's YouTube announced that it has moved its automatic speech-recognition and closed-captioning technology out of beta and has now made it available to the YouTube community at large. Most, if not all, YouTube videos now include a 'CC' button that, if pressed, will automatically generate the closed-captioning technology. The technology processes the audio feed using the speech-recognition technology used in the core voice search feature that has also been built into the Android voice search feature, the GOOG-411 phone search, and other products."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

102 comments

As long as they don't use GVoice Tech. (4, Insightful)

bennomatic (691188) | more than 4 years ago | (#31379076)

Or you'll end up with captions like this:

Hey glum, Jen tonight. It's apologize for it, interrupting our conversation in early as this afternoon, yes, so I wanted to returning your call and you know check in with you further. Alright, hope you, I hope you're doing well done. Sounded like you, works but alright. Well I'll call me later. I'll talk to you soon. Bye.

Re:As long as they don't use GVoice Tech. (1)

rmadhuram (525803) | more than 4 years ago | (#31379110)

Some of my funny ones here:

"Hey bottoms ASAP. But on the religious anyways, call me back and I'm at my just call me back. Thank you."

"Hey what's going on man, this is in 2 mother and anything any cool commands. We have some. Please let me stop you have a cable is nothing important. Bye. "

"Hey Todd, on a bit. The Negro, then I put in an active on plan and payPal okay called up and have a couple (630) 440-6809. Okay bye. "

Re:As long as they don't use GVoice Tech. (0)

Anonymous Coward | more than 4 years ago | (#31379130)

I hope your friend isn't charged for incoming calls/texts or auto-rickrolls: (630) 206-1300

Re:As long as they don't use GVoice Tech. (0)

Anonymous Coward | more than 4 years ago | (#31379838)

Charged for incoming calls/texts? That's the weirdest thing I have ever heard.

Re:As long as they don't use GVoice Tech. (0)

Anonymous Coward | more than 4 years ago | (#31379858)

You'd be surprised in the land of the free...

Re:As long as they don't use GVoice Tech. (3, Funny)

Mr2001 (90979) | more than 4 years ago | (#31379158)

My funniest one:

"Hello voice subscriber what. Hey if you few questions for you. They can feel me 6 like a year like 2 years ago to like forever. Go you came over and I was locked out of the password didnt know the password so much and we wanted. Anybody passed it. I don't know how you guys have a good i just took it out for the first time in years and it says your class is expired. I must be changed and I go to that the windows X P professional you went and dollar dishing whatever it is really old addition, windows 85,001 yet and it's give me a change. Faster screen and says, administrative, which is still around. Funny has got hold us for new password. I confirm you got through. I've any idea what the password again, 30, or if you're more than the who knows no idea what it would've been so if you tell me but sister for you know the next week, otherwise, I was gonna go out to confirm for some a long time, so if you should come pick the and a case."

Re:As long as they don't use GVoice Tech. (1)

aCC (10513) | more than 4 years ago | (#31379612)

How about you leave a voice message just reading that text? What's the result? Maybe it's some kind of "encryption" like ROT-13 but for voice messages. ;-)

Re:As long as they don't use GVoice Tech. (3, Funny)

jargon82 (996613) | more than 4 years ago | (#31379888)

I'll never forget the time I was playing with dragon (the speech recognition software), and it seemed to pick up an obsession for the word "orange"... Mall was orange. Bus was orange. Elephant was Eggplant, but that's a pointless tale for another time...
Meanwhile, speech recognition still fails, and google voice is just the worlds best demonstration of why :)

Re:As long as they don't use GVoice Tech. (2, Funny)

John Hasler (414242) | more than 4 years ago | (#31380258)

I know people for whom the examples in this thread would be accurate transcriptions...

Re:As long as they don't use GVoice Tech. (2, Informative)

The MAZZTer (911996) | more than 4 years ago | (#31379114)

Phone audio quality is generally much poorer than online videos, in my experience.

Re:As long as they don't use GVoice Tech. (2, Insightful)

bennomatic (691188) | more than 4 years ago | (#31379142)

True. I think that Google should put an app on their Android phones that recognizes when someone is connected to a GVoice vmail box, and does the recording and processing locally. I figure that'd make a much more accurate translator.

Re:As long as they don't use GVoice Tech. (1)

Idiomatick (976696) | more than 4 years ago | (#31379616)

Pretty brilliant idea. It'd be a bit annoying to implement and could only work on data plan phones android - android. But it seems feasible. Wonder how much of an improvement that would reap.

Re:As long as they don't use GVoice Tech. (1)

SpinyNorman (33776) | more than 4 years ago | (#31380854)

I doubt it'd make any difference.

Speech recognition technology is really still in its infancy... it's possible to get good results but only under the most controlled of circumstances... high quality microphone, no background noise, clear diction, recognition engine trained for the speaker, etc. Even then it may depend on what you're actually saying, since in the case of any ambiguity a smart recognition engine will fall back to grammatical analysis and word frequency counts etc to try to guess right.

The real problem is that speech recognition requires artificial intelligence to do right and we don't have it. Often we understand speech that word for word is basically unintelligible, but we automatically apply context and intelligence to figure out what the speaker was trying to enunciate. Without full AI, a computer can't do that - it's much more limited.

When the foreigner serving you at McDonalds mumbles something after taking your order, you only need to understand a single word, or maybe not even that, to realize they are saying "to stay or take out?", but a computer today would need the person to have spoken clearly enough to have made out the words. Ditto for a stream-of-consciousness rambling GMail voice message with highway noise in the background - YOU may be able to FIGURE OUT what is being said, but that's not the same as the words actually being intelligible which is what a computer, without AI, would need to be able to transcribe it accurately.

Having a clear speech signal in the first place, or having a broadband vs telephone limited one, isn't going to make much difference, except under otherwise very controlled conditions.

Re:As long as they don't use GVoice Tech. (0)

Anonymous Coward | more than 4 years ago | (#31381856)

Look at google translate. It's already impressively aware of context - if you translate "I'm going to watch that awful eighties comedy 'Airplane' tonight" into Norwegian, airplane isn't translated literally ("fly"), but correctly into what that movie was called in Norway ("Hjelp, vi flyr"). No one right in their mind would manually enter and try to keep updated lists of movie titles, so this is something the translator has learned on its own. If that's AI, google is damn good at it.

Sometimes, this causes it to make impressive errors (translating "Bodø Lufthavn" to "Santiago airport" for instance), but it only gets better and better.

Re:As long as they don't use GVoice Tech. (1)

SpinyNorman (33776) | more than 4 years ago | (#31382056)

Google translate has for the last couple of years been based on what is essentially database lookup rather than traditional grammatical/semantic analysis used by other translators such as Bablefish. When they made the switch the quality noticeably improved.

Basically they've got a huge database of snippets of language and their corresponding translations if different languages that was originally build from hand translated sources such as publically available United Nations documents, etc. When they translate a document for you using this approach they're basically just looking for the longest snippet of source document for which they've got an existing translation, then moving onto the next etc. Obviously there's quite a bit more to it than that, but this gives the gist of it. What's great about this approach is that you get chunks of hand translated document and it can handle idiom.

I just tried the example you gave, and it left the movie title "Airplane" as-is in the Norwegian translation. Maybe you tried it a while back using the old version?

Re:As long as they don't use GVoice Tech. (1)

Toonol (1057698) | more than 4 years ago | (#31382548)

That's kind of the pkzip/unzip algorithm (or other compression) algorithm, of bundling up the longest/most repeated character streams first... except that the matching is done using an external lookup table, swapping languages when decoding.

Ok, it's not MUCH like that, but it's enough to give me a few ideas. Speech recognition, data compression, and AI have a lot in common; they're all bottlenecked at the same point.

In theory, I wonder if an effective (lossy) text compression could be created that strips individual language syntax out? In effect, it would store 'ideas' (tokens representing the largest linguistic streams representable as a single unit), and a specific language could be selected at decompression time to render into.

Re:As long as they don't use GVoice Tech. (1)

crossmr (957846) | more than 4 years ago | (#31381030)

it doesn't matter. I just checked out a couple of high quality videos with a normal person speaking english without background noise..it was a jumbled mess of garbage. Another fine google production.

Re:As long as they don't use GVoice Tech. (2, Interesting)

uncqual (836337) | more than 4 years ago | (#31379264)

My most intersting one:

Hey Hello hello, hi bye hello hello. Bye bye hey hello, test, Hello bye hello. Bye hi hello. Bye, hello hey hey hello hello hello. Bye bye hello. Call hey bye hello hello hello hello hello, hey bye bye bye hello. Bye hello. Bye hello hello. Bye. Hello S hello. Bye bye. Hello. Hello. Yeah, hello. Bye hello hello hello hello, hey, hey, yeah.

Some of the words hello and bye were dark, the rest were mostly light gray.

What, one may wonder, was the actual message? Well, it appeared to be someone trying to fax something - although, the tones didn't sound quite like FAX negotiation tones, but surely no one would be mis-dialing a modem number in this day and age.

I was intrigued by the limited vocabulary it produced here. Almost as if the most common words are these greeting words (hello, hey, hi) and sign off words (bye) and these words are so preferred that line noise ends up just being these top few words.

Re:As long as they don't use GVoice Tech. (1)

zill (1690130) | more than 4 years ago | (#31379396)

I have severe developmental speech disorder, you insensitive clod!

I'm never inviting you to my parties again.

Re:As long as they don't use GVoice Tech. (1, Interesting)

Anonymous Coward | more than 4 years ago | (#31379704)

Isn’t that essentially what modem negotiation actually is? The two modems talking to each other, saying “hello” at length?

My goodness. It’s alive, and it can understand V.34...

Re:As long as they don't use GVoice Tech. (1)

dominious (1077089) | more than 4 years ago | (#31380054)

It would really be funny if the developers planted a message when listening to standard fax negotiation tones:

Hey, how are you?
Not much going on. This new "Exchange Server" is such an asshole I wish he dies!
Yeah I know what you're sayin.. I think they're gonna throw me away soon:(
Oh well...here's the fax anyway. Hope to hear from you soon..Bye!
bip-bip bip bip bip bip-bip....
bib bip bip-bip...

Re:As long as they don't use GVoice Tech. (1)

Joe Tie. (567096) | more than 4 years ago | (#31379328)

I wonder what accounts for the difference. I'd say in general most people who call me come out 99% perfect on the transcripts. Except one friend, with a Texan accent, who usually is closer to 50% accurate.

Re:As long as they don't use GVoice Tech. (1)

John Hasler (414242) | more than 4 years ago | (#31380448)

> I wonder what accounts for the difference.

Some people sound that way on my answering machine (and others come across that way in person).

Re:As long as they don't use GVoice Tech. (1)

bertoelcon (1557907) | more than 4 years ago | (#31381098)

Except one friend, with a Texan accent, who usually is closer to 50% accurate.

Of course if you live in Texas and get called by mostly people with Texan accents you get 50% accuracy.

Re:As long as they don't use GVoice Tech. (1)

assassinator42 (844848) | more than 4 years ago | (#31381494)

I wouldn't say the transcripts have been 99% accurate word for word for me, but I can almost always get the meaning. The one exception being a friend with a speech impediment.
The YouTube transcripts are pretty much useless from what I can tell.

The once and future Deaf accessible internet. (2, Informative)

flerchin (179012) | more than 4 years ago | (#31379078)

Huzzah! Now if we can just get subtitling/captioning on Netflix streams, the net will be accessible to the Deaf again.

Re:The once and future Deaf accessible internet. (3, Insightful)

aussie_a (778472) | more than 4 years ago | (#31379254)

I almost never turn on my speakers and yet I find the internet quite accessible.

I'm not saying this isn't a great development. But to try to portray the internet as inaccessible to the deaf before now is ridiculous.

Re:The once and future Deaf accessible internet. (1)

WeatherGod (1726770) | more than 4 years ago | (#31381892)

Actually, the internet of the old used to be extremely accessible to deaf and hard-of-hearing people. However, the advent of YouTube, podcasts and other multimedia services has caused an exciting and new part of the internet to be inaccessible to these people. This technology -- if it works -- will help bring the internet back to deaf and hard-of-hearing people.

Netflix needs to get away from SilverDimGlow (1)

mrflash818 (226638) | more than 4 years ago | (#31380972)

This is why Google rocks ...and M$'s tarnished SilverDimGlow does not.

Srongly wish Netflix would realign themselves to use a youtube-like setup instead, but I strongly suspect M$ either threw them 'an offer they could not refuse', or this will become yet another mutual lock-in, like Intel_M$.

(Really irritated that I cannot, yet, watch Netflix from my Debian machines.)

Not only that (1, Interesting)

Anonymous Coward | more than 4 years ago | (#31379084)

They also changed the way videos are sent to the browser, many flash video players are failing because of that.

Re:Not only that (0)

Anonymous Coward | more than 4 years ago | (#31379686)

it's worth mentioning that captions don't work with html5, so Flash is still the only solution for video if you want that feature.

again, it looks very much like flash is moving ahead all the time, while the alternatives struggle and fail to get off the ground.

Re:Not only that (1)

icebraining (1313345) | more than 4 years ago | (#31380152)

Wrong [opera.com].

Re:Not only that (0)

Anonymous Coward | more than 4 years ago | (#31380558)

sorry.... i don't want to make you look silly but please have a look at...

http://captionaction2.blogspot.com/2009/07/html-5-has-no-captioning-provisions.html [blogspot.com]

http://billcreswell.wordpress.com/2010/01/24/html5-youtube-and-why-the-emperor-has-no-captions/ [wordpress.com]

my apologies again!

Re:Not only that (1)

icebraining (1313345) | more than 4 years ago | (#31382658)

From your second link:

There are several examples of how captions *can* be implemented with javascript, but not standard format. http://blog.gingertech.net/2008/12/12/attaching-subtitles-to-html5-video/, and my favorite implementation so far (Firefox 3.1+/ogg) - http://www.mozbox.org/pub/srt/index2.xhtml [mozbox.org]

So, they *can* be implemented using Javascript - You don't need any kind of plugins for that. And if you had read my link, you would see a link with an example of that [opera.com], which even provides you to a selection of language that changes in real time.

Automatically generate the technology? (5, Funny)

Mr2001 (90979) | more than 4 years ago | (#31379118)

Talk about advanced! Back in my day, we had to pay engineers to generate technology for us!

Re:Automatically generate the technology? (2, Funny)

nebaz (453974) | more than 4 years ago | (#31379162)

Feeling feeling = Feeling.getFeeling(Feeling.LAUGHTER);
feeling.express();

Re:Automatically generate the technology? (1)

AndrewBC (1675992) | more than 4 years ago | (#31379316)

Sounds like you're suffering from stuttering semantics -- Either that or you're an egregiously emotional eccentric.

Re:Automatically generate the technology? (0)

Anonymous Coward | more than 4 years ago | (#31379554)

He suffers from OOP, you insensitive clod!

Re:Automatically generate the technology? (1)

Thantik (1207112) | more than 4 years ago | (#31382112)

And here I thought sprinkling 'self.' throughout my Python classes made me egotistical...

Re:Automatically generate the technology? (1)

MichaelSmith (789609) | more than 4 years ago | (#31379182)

I can sell you a UML modeller which will do that. Just $100k per license. Believe me its cheap at the price. Let me demonstrate how you refactor the code. Just drag this little icon from here to here and the other little icons reorganise themselves around it. Buy this and you will never have to hire an engineer again!

Re:Automatically generate the technology? (0)

Anonymous Coward | more than 4 years ago | (#31379368)

Rofl

Technology, technology, baked beans and technology (0)

Anonymous Coward | more than 4 years ago | (#31379120)

Have they got anything without technology in it?

Noteable, but still very much experimental (3, Informative)

Coopjust (872796) | more than 4 years ago | (#31379124)

The results are still very funny, especially for non-English speakers.

However, it's a technology that is still relatively young. One hopes that applying it to Youtube will help Google improve the accuracy.

However, except for spoken videos with a native English speaker with absolutely no background noise, it's nothing more than a novelty at this point. Trying this on several videos not only yielded hilarious results, but delays of several seconds in some cases.

Re:Noteable, but still very much experimental (4, Interesting)

Idiomatick (976696) | more than 4 years ago | (#31379634)

"One hopes that applying it to Youtube will help Google improve the accuracy."

This, if they allow for corrections it could be an incredibly huge resource of data for google. They'd end up with people spending millions of man hours teaching google how to do voice recognition. And having highly accurate voice recognition would be a boon for society generally.

Re:Noteable, but still very much experimental (3, Insightful)

crossmr (957846) | more than 4 years ago | (#31381044)

and then some company will come along and sue them for not being competitive because they have access to all this great data to make fantastic products other companies can't make.

Re:Noteable, but still very much experimental (1)

Djupblue (780563) | more than 4 years ago | (#31382174)

Poor, poor Microsoft crying and complaining when they get punished for breaking the law.

Re:Noteable, but still very much experimental (1)

Coopjust (872796) | more than 4 years ago | (#31381066)

And that's the interesting part. Some people provide their own captions, that's effectively training for the voice recognition algorithm.

Re:Noteable, but still very much experimental (0)

Anonymous Coward | more than 4 years ago | (#31379988)

With None-English you mean English?

http://www.youtube.com/watch?v=EzV3wIrFa3U [youtube.com]

turning it on on Rocketboom video's it messes up (hard). however people talking Slack (USA) English will render correctly.

Re:Noteable, but still very much experimental (1)

oztiks (921504) | more than 4 years ago | (#31380492)

Im the first to agree but then i saw Microsoft's attempt at voice recognition and its just as poor.

There needs to be significant improvements as whole until this stuff works properly, sadly i think it's still got a long way to go.

Accents play a big part, also the rate at people speak join words, you can tell youtube's voice recognition is good, but it doesn't keep up in those areas at all.

Interactive Transcripts vs. Captions (2, Insightful)

syke1911 (1760892) | more than 4 years ago | (#31379150)

I'm trying to understand the difference between an interactive transcript, as seen at protranscript.com, and a caption. Why did Google go the embedded captioning route? Isn't the goal to create searchable content? If so, captions don't seem to be the solution.

Re:Interactive Transcripts vs. Captions (0)

Anonymous Coward | more than 4 years ago | (#31379206)

I'm trying to understand the difference between an interactive transcript, as seen at protranscript.com, and a caption. Why did Google go the embedded captioning route? Isn't the goal to create searchable content? If so, captions don't seem to be the solution.

I'm not sure myself.. my organization spent a significant amount of money converting our captioned videos to interactive transcripts as management believed they were the 'best new thing'. Then again, Google tends to know what they are doing.

Re: Interactive Transcripts vs. Captions (1)

Alwin Henseler (640539) | more than 4 years ago | (#31379338)

I can imagine Google would cache intermediate results, possibly improve those results from time to time, and create a good coupling to its own search engine. Other search engines might have to 'distill' searchable text from the video (=difficult?), so that Google can search YouTube video content better than other search engines? Just a guess, FWIW.

Re:Interactive Transcripts vs. Captions (1)

phantomfive (622387) | more than 4 years ago | (#31379764)

Google has no problem searching it, they have the data. The problem will be for other bots searching youtube, and I can imagine reasons why Google would not want to make it easy for others to search their site.

CC this... (5, Funny)

flogger (524072) | more than 4 years ago | (#31379156)

I looked but I can;t find google's CC button for this video: http://www.youtube.com/watch?v=ZA1NoOOoaNw [youtube.com]

Re:CC this... (1)

R3coiler (1740032) | more than 4 years ago | (#31379302)

Re:CC this... (1)

mdwh2 (535323) | more than 4 years ago | (#31380362)

I was disappointed to see they don't have it for this: http://www.youtube.com/watch?v=t6FUR_nhGX8 [youtube.com]

(Seriously though - after searching through many videos, I've yet to find a single one that does have the option, other than one that someone posted above. "Most, if not all"? "All" is clearly not true, and it's hard to see justification for the "most", unless I'm being very unlucky in my search...)

Re:CC this... (0)

Anonymous Coward | more than 4 years ago | (#31379596)

Oh! My! Non-existent-deity-of-choice!

Search? (2, Insightful)

Spy Hunter (317220) | more than 4 years ago | (#31379160)

I haven't seen any mention of search, which seems odd. Google is adding captions to every YouTube video, and nobody is interested in whether you'll be able to search the captions or not? Seems to me like it could be quite useful to search the captions of every video on YouTube.

Re:Search? (1)

lobsterturd (620980) | more than 4 years ago | (#31379176)

YouTube captions have been searchable since shortly after they were introduced.

Re:Search? (3, Informative)

Spy Hunter (317220) | more than 4 years ago | (#31379200)

Indeed; here's an example search showing caption results [youtube.com]. I'm just surprised that, of the several articles "covering" this story that I've seen, none have mentioned (even in passing) the applicability of universal captioning to search.

All yore soup tittles Arnie belong two arse. (1)

idji (984038) | more than 4 years ago | (#31379172)

Just imagine when they hook this up to Google translation and text2speech. You can choose your language for youtube audio.

Wish commercial TV stations would use this tech! (2, Interesting)

Alwin Henseler (640539) | more than 4 years ago | (#31379284)

Wish this technology would be used by TV stations to provide 'sort of' subtitling for programs that don't have any. This could be helpful for deaf/hearing impaired viewers.

Where I live (Netherlands), there's a few public TV channels. Most programs on there are subtitled using a dedicated teletext page (888). For the bulk of commercial channels, there's also subtitles for things like prime time movies, and specific (popular) TV shows. But a lot of it is not, like average day time shows / late night documentaries / commercials etc. etc. This is due to manpower/cost issues: you have a limited audience, a limited percentage of viewers that is deaf/hearing impaired, and (proper) subtitling needs humans. Read money = eating into commercial TV stations' bottom line. It's entirely up to these stations to decide what to subtitle, and what not.

This technology (combined with automated translation) would be a nice complement for those programmes where human-provided subtitling is deemed to expensive. Automated translation is still bad at times, but for deaf/hearing impaired people, subtitles with a bad translation can still be better than no subtitles at all. An automated system shouldn't be very expensive when applied to mass media like national TV, and would be easy to provide for all programmes. And perhaps speech recognition / automated translation would improve over time, to the point where humans aren't needed anymore to get good results.

Re:Wish commercial TV stations would use this tech (1)

crossmr (957846) | more than 4 years ago | (#31381080)

Proper subtitling needs humans, but come on, be honest. How much manpower does it actually require to subtitle something?
If its your native language its a matter of timing. Little else. If you're paying someone to be on the clock depending on the length of the program it might take anywhere from 30 minutes to a day for a long program. How much is a day's wages for even the lowest of budget infomercials?

if you're translating, you're probably not translating something new, and that means there are likely already native subs for it. So its simple a matter of translation, not timing. I've seen subbers here in korea fan-sub a 30 minute sitcom fresh off the air, in just a few hours without even public access to english subtitles first.

Subtitling live TV (1)

tepples (727027) | more than 4 years ago | (#31381808)

If you're paying someone to be on the clock depending on the length of the program it might take anywhere from 30 minutes to a day for a long program.

If the captioning takes longer than the program, you have to do it in advance. This rules out captioning news, sports, entertainment awards, and other live programs.

Re:Subtitling live TV (1)

crossmr (957846) | more than 4 years ago | (#31384848)

not really. Most lives things are actually shown on a tape delay. CC already exists for the news. but usually live programming is less concerned with exact timing and its often a constant stream of words, like with the news. I'm talking more about subbing a 2 hours movie and spending time making sure the captions line up perfectly with the dialog. It can be a tedious process. With a live program you just need someone who can type fast and accurately with a slight tape delay to check for any crazy mistakes.

"Technology" (0)

Anonymous Coward | more than 4 years ago | (#31379306)

And the overused buzzword of the day is ...

Go to youtube RIGHT NOW for some laughs...for now (1)

mykos (1627575) | more than 4 years ago | (#31379308)

I'm sure they will improve it dramatically in the coming months and years, but I have not laughed so hard in a while at some of the stuff it comes up with. It's as funny as using a translator to translate a word into Korean and back again.

Re:Go to youtube RIGHT NOW for some laughs...for n (0)

Anonymous Coward | more than 4 years ago | (#31379334)

I agree.

YouTube's CC needs for more work. The English spoken words to English close caption is far off.

http://www.youtube.com/watch?v=VROZ2bbiQLc

The narrator of this video makes no mention of "senate Chris".

Too funny. The translation is totally off.

Re:Go to youtube RIGHT NOW for some laughs...for n (1)

Vintermann (400722) | more than 4 years ago | (#31379626)

Take a look at this board game review:

http://www.youtube.com/watch?v=Uv6pIFgfa0U [youtube.com]

His name (Tom Vasel) appears to be consistently translated "oh come on now". What, don't they believe that's his name? He comes with surprising revelations such as "I'll be your next president" and wonderful nonsense like "but it is a ten-year period deduction gay".

Dear Aunt, (0)

Anonymous Coward | more than 4 years ago | (#31379366)

let's set so double the killer delete select all.

About as good as I expected (1)

Clovert Agent (87154) | more than 4 years ago | (#31379448)

Which is to say, pretty darned feeble. Clever work, but basically rubbish when compared to user expectation.

One of my favourite videos is this one (http://www.youtube.com/watch?v=yYAw79386WI [youtube.com]), dating from the '30s, about how differential gears work. The voice-over is that beautifully clear, precise American newsreader accent of the period, and there isn't any background music to confuse things. If anything should be a perfect candidate for a computer to analyse, it's this.

But the captions are worse than I'd expect from off the shelf software like Dragon Dictate, which isn't particular special itself. A perfectly enunciated "road" with a very clear final D, is misheard as "role", for example. There are mistakes in nearly every line, and while sometimes they're obvious, sometimes they're just bizarre.

I'm tempted to say "nice try, good work for a first shot, and hey, it's a beta so it'll get better." But I've been exposed to software dictation software for over a decade, and it just hasn't, really. So I don't think it will, and I don't think most people will get much use out of it, apart from the odd giggle at the YouTube equivalent of "Dear aunt, let's set so double the killer delete select all..."

What I would be interested in hearing is whether this, flawed as it is, is useable enough for a deaf person. In context, you'd probably figure out that "role"="road", but would you guess that "outmoded"="are mounted"? Maybe, maybe not - watch the video on mute with the captions on, and it's kinda tricky but you can get the gist of it. But then I'm reminded that this is the best case video I could find, and most will probably be worse. It'll be interesting to see what the feedback is from deaf people, and whether it really makes a difference, and whether the context makes up for the poor quality. I'd like to hope it might do just that.

Re:About as good as I expected (1)

gr8dude (832945) | more than 4 years ago | (#31379742)

I think the solution is to let people submit corrections for the automatically generated subtitles.

This way we'll get a starting point, so the problem becomes more simple.

I am now trying to write the subtitles for one of my lectures, and I find it very very tiring and difficult. The greatest problem for me is in synchronizing audio with text - I have to manually indicate in which time period a particular text needs to be shown.

In other words, the bottleneck is not in figuring out what the words are, it is in figuring out how to sync them. Most of the time is invested into shifting ranges and offsetting the subs by a few ms, until I get that right. The most difficult part is in synchronizing the pieces between them - if I shift the interval for one piece of text - it can overlap with adjacent pieces, and they need to be reviewed as well.

If a computer could do that for me - I'd be happy.

Re:About as good as I expected (1)

Anarki2004 (1652007) | more than 4 years ago | (#31381472)

letting people submit corrections will work great until /b/ discovers it. Then every other caption will be "jews did 911" and "never gonna give you up". Remember Bucket the chatbot?

Re:I don't have the captions (1)

rduke15 (721841) | more than 4 years ago | (#31384676)

I tried the video mentioned here, but it just tells me "Captions are not availabel". Strange.

Is it because I'm in Europe?
Because I use Firefox on Linux?

The video mentioned a few posts before that is even weirder: it seems to have captions, I can turn them on, but no captions are displayed.

Whatever happened to (0)

Anonymous Coward | more than 4 years ago | (#31379480)

Whatever happened to the Berger Liaw speech recognition system? One article (from 11 years ago) is here [usc.edu]. It had the ability to track multiple (dozens) of voices simultaneously, could process speech spoken in a continuous stream, and could detect speech in very high noise environments (in some tests, human listeners could only tell what words were being spoken with 50% accuracy, and the voice recognition system could still tell what was being spoken 85% of the time ---very high noise environments like someone speaking in normal room conversation voice, beside a jet engine. The US Navy (submarine service) was a strong advocate of the technology, but I've heard very little about it since.

Can they combine this with lip reading? (1)

wisebabo (638845) | more than 4 years ago | (#31379618)

Could you combine this with the lip reading technology that was introduce to allow "voiceless" cell phone calls? http://www.ubergizmo.com/15/archives/2010/03/lip_reading_technology_unveiled.html [ubergizmo.com] Wouldn't that improve the accuracy for those scenes where the speakers mouth is visible?

Or how about using the subtitle tracks that are in a different language and reverse translating them to provide additional clues as to what the speaker might have been saying? It might help a little.

Which? (1)

WGFCrafty (1062506) | more than 4 years ago | (#31379652)

Fish sticks or Fish dicks?

It's a Markov chain (1)

tepples (727027) | more than 4 years ago | (#31381834)

If the recognizer isn't sure, perhaps it could use the fact that there are six times as many "fish sticks" as "fish dicks" in Google's web index. I'd bet it already does; there's a reason for the "Markov" in hidden Markov modeling.

Let me guess, Youtube.ru (4, Funny)

santax (1541065) | more than 4 years ago | (#31379772)

reads the caption and then produces the video?

Re:Let me guess, Youtube.ru (1)

HoppQ (29469) | more than 4 years ago | (#31381874)

reads the caption and then produces the video?

Actually, a rather obvious extension to this technology would be to feed the captions to a machine translator and a text-to-speech synthesizer to produce e.g. Russian voice for a video for those Russians who don't comprehend spoken or written English.

It hates Bono (0)

Anonymous Coward | more than 4 years ago | (#31379804)

I don't think it likes Bono. I was watching a speech by George Bush where he says about Bono: "he is a man of depth and great heart". The caption was: "he is a man of death and great whore"

Really? most? (1)

crossmr (957846) | more than 4 years ago | (#31380216)

Most, if not all, YouTube videos now include a 'CC' button that, if pressed, will automatically generate the closed-captioning technology.

The first 10 videos I've been to don't include it. Including suggested and front page vids.

Is this a metric most?

Re:Really? most? (1)

crossmr (957846) | more than 4 years ago | (#31380250)

oh wait.. just found one.
What a train wreck. cheers google on yet another amazing product.
Here is what is actually said:
Hey Everyone So a lot of you may know that the Vancouver 2010 Winter Olympics are coming up

and here is the transcribed audio:
Everyone felt like a man of the I think every time he's had a winter olympics are coming

Just fantastic..wow..
This is certainly front page worthy.
I'm going to roll out a different product.
Basically the system will try to guess (not very accurately) how many words are said and then just pick a random word out of the dictionary. I would guess that averaged out it might provide something more readable than this.

This is on par with their "beta" CC translation service which used google's fantastic web translation skills to translate english into horribly butchered and unreadable asian languages (translation into korean is confirmed as a complete waste of time)

Even more fantastic they allow you to then translate these autogenerated pieces of roadkill..wow.. who could this possibly be useful for?

Good timing (1)

RealGrouchy (943109) | more than 4 years ago | (#31380788)

This is excellent timing; I clicked on the link to a video on the previous /. story but my sound was not working. I thought, "man, I wish more videos were closed-captioned," not just for lazy people like me but also for the hearing impaired.

Finally it'll be easier for me to share these videos with my deaf and hard-of-hearing friends!

- RG>

Hitler Parodies the easy way (1)

BenJeremy (181303) | more than 4 years ago | (#31380810)

I like the "CC" feature... it makes it very simple to do those Hitler Downfall parodies... but I was surprised that I was the first to actually make one using the feature [youtube.com]. My video features closed captions for both the original German-to-English translation, and a Lost parody script. I also provide a handy download to a text-editable SRT file so others can make their own (does that make me a bad person?).

The nice thing is that you can add as many subtitle files as you like... and give each of them separate titles. It understands language, so presumably, my parody can be run through translator (on the fly) for any other language. Now, one "blank" can provide hundreds of alternate parodies from one YouTube video.

I just wonder if this "automatic" feature will try and create subtitles on my blank, with subtitles already loaded.

Re:Hitler Parodies the easy way (1)

BenJeremy (181303) | more than 4 years ago | (#31380822)

On a side note, I see that YouTube has not gotten to any of my videos with this "automagic" speech recognition-generated closed captions. I was hoping they would try and make one for this video of mine [youtube.com], just to see what it generated.

Might mean videos could be searchable by content (1)

mrflash818 (226638) | more than 4 years ago | (#31381018)

An interesting upside to all this might be that, if Google keeps the dialog from youtube content in their searchable database, people may soon be able to search for videos via content.

Right now, I believe keywords need to be done, but the auto-captioning would remove that barrier, perhaps.

"Here's looking at you, kid."

Is this Gaudi? (1)

snsh (968808) | more than 4 years ago | (#31381222)

This is good news. I've been looking at speech-to-text and audiomining for a while. My goal was not captioning, but search, so in a long video or large set of videos, a user can quickly find snippets of video mentioning a word or phrase, and replay the found snippets. I found a bunch of options but budget was always in issue. Google Audio (Gaudi) was free (cool!) but seemed like a dead-end project after the 2008 elections. Blinx- spinoff from BBN focused on media companies. $$$$$$. Autonomy- enterprise search/monitoring company bought tech from Virage. $$$$$$. Virage- sold their tech to autonomy, then redeveloped it. Coveo- audiomining software using Nuance SDK and Silverlight front end. $$$$$ . TVeyes- does a lot of real-time monitoring. $$$$$. Nexidia- audiomining software uses their own phoneme tools. $$$$$$. Is this YouTube service an incarnation of Gaudi? Either way, it's nice that it's finally out there.

Now easier to catch unwanted content (2, Interesting)

Aoet_325 (1396661) | more than 4 years ago | (#31382704)

Soon (now?) they can generate captions of everything heard (or sung) in a video immediately after upload and match the captions against lyrics and transcriptions of copyrighted works or even just search them for specific keywords. Then they can flag those videos as possible copyright violations or even prevent them from being displayed until after being reviewed by someone.

I'm not saying captioning isn't a good idea, only that it can be used for more than just assisting the hard of hearing.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...