Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Using Photographs To Enhance Videos

timothy posted more than 6 years ago | from the ready-for-my-closeup dept.

Graphics 102

seussman71 writes with a link to some very interesting research out of the University of Washington that employs "a method of using high quality photographs to enhance a video taken of the same subject. The project page gives a good overview of what they are doing and the video on the page gives some really nice examples of how their technology works. Hopefully someone can take the technology and run with it, but one thing's for sure: this could make amateur video-making look even better than it does now." And if adding mustaches would improve your opinion of the people in amateur videos, check out the unwrap-mosaics technique from Microsoft Research.

Sorry! There are no comments related to the filter you selected.

Is anyone else sick of demos? (4, Insightful)

QuantumG (50515) | more than 6 years ago | (#24607619)

Why is UW not releasing their source code? If they intend to spin off commercial products, why are they releasing demos? Hell, even *Microsoft* is releasing demos of this stuff.. is Apple and Google the only companies that can ship product these days (even if it is "beta" you can at least freakin' use it).

No more demos. We know you're smart, now make something useful please.

Re:Is anyone else sick of demos? (4, Insightful)

imsabbel (611519) | more than 6 years ago | (#24607735)

Simple reason:
They _say_ that it does this automatically.

Translation: We put some phd student on it who spend some months optimizing the settings for the 2 selected scenes so we can make a nice publication and maybe get more money.

If you just look at their steps of the workflow, the way they discribe it just isnt possible (like the way they "stereoscopically" create a depth-map from a _single_ still photograph..).
Not to mention that the first scene looks like a bad video game level after their "improvement".

Find Video Frame - Apply (3, Insightful)

KalvinB (205500) | more than 6 years ago | (#24607981)

The easy way would be to use the already calculated depth field from the frame in the video that best matches the photo.

Re:Find Video Frame - Apply (1)

imsabbel (611519) | more than 6 years ago | (#24608661)

True. But still, notice how they use a reference movie thats basically a side-scroller to create this depth-map?

Basically, stuff like this is released for _every_ siggraph. The first few times i was still blown away, but now that i am used to it (and in academia myself, so i know the paper game), i simply cannot get excited for semething that will never see realisation.

Re:Is anyone else sick of demos? (4, Funny)

CaptainPatent (1087643) | more than 6 years ago | (#24608055)

like the way they "stereoscopically" create a depth-map from a _single_ still photograph.

No no no, read the fine print

Stereocycloptically, not stereoscopically!

Re:Is anyone else sick of demos? (4, Informative)

mo (2873) | more than 6 years ago | (#24608435)

like the way they "stereoscopically" create a depth-map from a _single_ still photograph

TFV said they were using video frames to do stereoscopic depth-mapping. Since the source footage changed perspective, they can build a depth map based on the relative shift of each object in the video, and then project the high-quality photograph on top of the derived 3D structure

Re:Is anyone else sick of demos? (1)

pfafrich (647460) | more than 6 years ago | (#24608437)

If you just look at their steps of the workflow, the way they discribe it just isnt possible (like the way they "stereoscopically" create a depth-map from a _single_ still photograph..).

In the video they say they use structure from motion to create the depth map.

Re:Is anyone else sick of demos? (2, Informative)

Swizec (978239) | more than 6 years ago | (#24607737)

That's because they're all renders! None of it is real.

Pics or it didn't happen. Or in this case, apps or it happened only in photoshop/whatever.

Re:Is anyone else sick of demos? (1)

MobileTatsu-NJG (946591) | more than 6 years ago | (#24608803)

Pics or it didn't happen. Or in this case, apps or it happened only in photoshop/whatever.

Perhaps you're right. However, apps like SynthEyes are already like 90% of the way there. Their demos aren't a huge leap away from what is already on the market.

Re:Is anyone else sick of demos? (-1, Troll)

Anonymous Coward | more than 6 years ago | (#24607771)

UW is in Seattle. This means that the videos that they enhanced with the photos were probably gay porn. I think it is obvious why they haven't released a demo.

Re:Is anyone else sick of demos? (3, Informative)

Hays (409837) | more than 6 years ago | (#24607781)

The publication is supposed to contain enough information to recreate the results.

Question 4 on the SIGGRAPH review form -
"4. Could the work be reproduced by one or more skilled graduate students? Are all important algorithmic or system details discussed adequately? Are the limitations and drawbacks of the work clear?"

If you or a company wants it bad enough, the information is there, unless the review process failed (which does happen).

This wasn't a SIGGRAPH paper but the ability to reproduce results is none-the-less a standard prerequisite for academic publication.

It's certainly not as convenient as releasing source code, but that's sometimes a big challenge for an academic researcher because the last thing they want is to have to support buggy, poorly documented research code for random people on the internet.

CREEPY! (2, Interesting)

Ohrion (814105) | more than 6 years ago | (#24608583)

Did anyone else notice a very creepy effect in the "enhanced" video with the bust? It made it look like the head was turning to look at you as you moved around it. *shudder*

Re:CREEPY! (0)

Anonymous Coward | more than 6 years ago | (#24611871)

Forget the statue, the newsanchors on TV keep looking at me wherever I am in the room...

Now that's creepy!

Re:Is anyone else sick of demos? (0, Flamebait)

cthulhu11 (842924) | more than 6 years ago | (#24611223)

UW *is* Microsoft these days. Try getting into their evening masters CS program without M$ on your resume.

Re:Is anyone else sick of demos? (1)

joeava (1147727) | more than 6 years ago | (#24612229)

If this is a third party funded project, is the source code copyrighted to the sponsor?

3D models from videos (4, Interesting)

4D6963 (933028) | more than 6 years ago | (#24607683)

The other cool part of it is that it derives a cloud of points from the video, meaning it can turn a video into a 3D model, apparently. However it seems like their program only uses it internally.

Re:3D models from videos (1)

Dekker3D (989692) | more than 6 years ago | (#24608165)

movie-to-model has already been done somewhere.. not sure if it's the same project, i think it involved microsoft somehow as well... anyway, something that turns a whole scene 3d would rock indeed.

Re:3D models from videos (4, Informative)

samkass (174571) | more than 6 years ago | (#24608331)

Takeo Kanade's lab [cmu.edu] at Carnegie Mellon's Robotics Institute did this in the mid 90's [cmu.edu] ...

Re:3D models from videos (1)

aaarrrgggh (9205) | more than 6 years ago | (#24614831)

I thought WashU was actually doing this type of thing in the early 90's as well for distance surgery applications over low-speed data links.

Re:3D models from videos (1)

retzkek (1002769) | more than 6 years ago | (#24608485)

Check out Photosynth [live.com] from Microsoft, as seen in this TED talk [ted.com] .

Pushmeet? (0)

Anonymous Coward | more than 6 years ago | (#24607711)

One of the researchers on that Microsoft project is:

Pushmeet Kohli

Which pretty much makes every other name out there a piece of shit.

Re:Pushmeet? (1)

hostyle (773991) | more than 6 years ago | (#24607887)

When PushMeet Shove ... is this one of those Profit! gags? Guys?

Re:Pushmeet? (0)

Anonymous Coward | more than 6 years ago | (#24608077)

Meet as in Meat as in Cock.

Patent Encumbered? (4, Insightful)

reality-bytes (119275) | more than 6 years ago | (#24607719)

I always get this feeling when I see a university-styled promotional release that the *software* patents are already pending.

I haven't the time to search just now but I'll bet there's at least one application pertaining to this method which encompasses a hell of a lot more.

simply amazing (0, Redundant)

PhrostyMcByte (589271) | more than 6 years ago | (#24607741)

Wow, that's got to be some of the coolest tech I've seen in years. I can't wait for some software to come out that uses it. Avisynth [avisynth.org] plugins, anyone?

Re:simply amazing (1)

4D6963 (933028) | more than 6 years ago | (#24607831)

I'm afraid it's going to be used in music videos to suddenly make make flowers or tentacles spread on walls, or other such stupid uses.

By the way, considered how you can modify an object in a scene by replacing a frame of it, or adding a picture to the mix, does it mean we can make Clint Eastwood look like he's 30 again by using a picture of him when he was young on some recent footage of him, or even do entire "head transplants" on videos?

Wow! That's progress! (-1, Redundant)

Channard (693317) | more than 6 years ago | (#24607791)

So my by reckoning, it'll only be a few years before the stuff featured in CSI and similar shows isn't *total* bollocks. A lot of people see photos and stuff being magically enhanced to perfect resolution, revealing the identity of some suspect, and think that can be done in real life.

Re:Wow! That's progress! (1)

4D6963 (933028) | more than 6 years ago | (#24608007)

So my by reckoning, it'll only be a few years before the stuff featured in CSI and similar shows isn't *total* bollocks. A lot of people see photos and stuff being magically enhanced to perfect resolution, revealing the identity of some suspect, and think that can be done in real life.

No not at all. You can get a good resolution out of a video if and only if you have a good picture to go with it, and it doesn't bring out any more data than you have in both your picture and video, it only combines what they have. Infinite zoom with a mere security camera as shown on CSI is theoretically impossible, well, with the exception of aliasing-based super-resolution techniques, but that sort of processing is done on many frames, takes the necessary aliasing and probably a few other conditions like noise level and what not.

Re:Wow! That's progress! (1)

cheater512 (783349) | more than 6 years ago | (#24611747)

Actually two bad quality images can be used to make a better quality image.

Re:Wow! That's progress! (1)

4D6963 (933028) | more than 6 years ago | (#24612387)

It really depends on what you call "bad quality".

Fractal compression (1, Informative)

IdeaMan (216340) | more than 6 years ago | (#24607803)

Combine this with fractal compression [wikipedia.org] and we could store all the videos we've ever seen on one hard disk.

Re:Fractal compression (1)

4D6963 (933028) | more than 6 years ago | (#24608039)

Actually that was the first thought that occurred to me, that it could be used to store videos by storing a high resolution keyframe and then only themovement data. Then it occurred to me that it's already what our modern video compression algorithms do. You can tell when you skip a WMV video ahead (or that it skips) and that artifacts look like they belong on the object they appeared on, until the next full keyframe.

It's cool if it works (-1, Troll)

Anonymous Coward | more than 6 years ago | (#24607849)

but I am worried by Microsoft's involvement. It's worth anything only if it's free software.

Of the two, I find the Microsoft one to be better (4, Interesting)

spoco2 (322835) | more than 6 years ago | (#24607877)

Really, the ability for their software to 'unwrap' a 3D object and allow you to fiddle with it as you wish is very cool.

And not limited to a 'static' scene.

And, really, if you're going to go to the effort of videoing a scene, then photographing the scene, then passing the video and the photos through their software. All to get better exposure and resolution.

Um.

Wouldn't it be a far better cost/effort equation to just buy a better HD camera in the damn first place?

Re:Of the two, I find the Microsoft one to be bett (2, Interesting)

cnettel (836611) | more than 6 years ago | (#24607929)

The more interesting aspect is that you can tweak those still photos, and then transfer them back. Photoshop some key frames, and you have suddenly created a video with the same manipulation. The video is just a cheap source for spatial data, which you can then texture with your photos.

Re:Of the two, I find the Microsoft one to be bett (3, Informative)

Enderandrew (866215) | more than 6 years ago | (#24608177)

That would greatly lower the cost of doing special effects, if you didn't have to do them frame by frame.

Re:Of the two, I find the Microsoft one to be bett (1)

Animaether (411575) | more than 6 years ago | (#24608713)

I assure you that in all but the most insane cases, doing that frame-by-frame is already not done anymore (high quality rotoscoping, on the other hand.. yipe). You model a quick mock-up in a 3D application, project your painted texture onto that, and composite that with the original footage.

What it does do is remove that whole 'You model a quick mock-up' part in many (not all) cases. Now to see who gets the patents, how much they are to license, and who get(s) to toss it into their editing suite.

Re:Of the two, I find the Microsoft one to be bett (1)

im_thatoneguy (819432) | more than 6 years ago | (#24621691)

It would greatly lower the cost of doing special effects if you did special effects frame by frame.

The problem with most of these technologies is that they never reach photo-realistic visual effect quality results.

Any time you have to do frame by frame VFX you're doing it for the sole purpose of getting a more perfect result. If you need average to crappy results you won't be doing it frame by frame.

This tech has cool potential and will be used by the VFX industry but it won't be automatic and it'll be to augment existing techniques.

Re:Of the two, I find the Microsoft one to be bett (1)

Endo13 (1000782) | more than 6 years ago | (#24608089)

All to get better exposure and resolution.

Clearly it's intended for pr0n!

Re:Of the two, I find the Microsoft one to be bett (1)

4D6963 (933028) | more than 6 years ago | (#24608251)

Wouldn't it be a far better cost/effort equation to just buy a better HD camera in the damn first place?

What about you do buy a HD camera and combine it with a 12 megapixel still camera? Besides just a HD camera doesn't fix the issues you can fix by then adding HDR shots to the mix.

Re:Of the two, I find the Microsoft one to be bett (2, Insightful)

jebrew (1101907) | more than 6 years ago | (#24608615)

I'd just connect a camera to the bottom of my camcorder (they both have a spot for mounting).

Then just have the still camera do continuous shooting @ ~1fps while you video. Match them up in this software when you're done and you're good to go...now if I could just get a hold of their software.

Re:Of the two, I find the Microsoft one to be bett (1)

phillips321 (955784) | more than 6 years ago | (#24613163)

Lets just say your camera is capable of 10MP, and that an average photo is 3MB(jpeg). Then 3MB x 60(secs) x 60(minutes) = 10800MB per hour I hope your camera can keep up with the write rate (plus the fact that your expecting it to always be in focus for the 3,600 photo's you've just taken).

Build this into the video camera? (1)

MikeFM (12491) | more than 6 years ago | (#24609529)

Still cameras can pretty much always take higher res pictures than video cameras no matter what price range you're looking at. I wonder if they could combine a still camera into a video camera and have it take high res still frames as the video is shot and then use this software to improve the video. It seems that'd be a cost effective way to squeeze more out of any level of camera.

Re:Of the two, I find the Microsoft one to be bett (1)

snooo53 (663796) | more than 6 years ago | (#24610059)

Wouldn't it be a far better cost/effort equation to just buy a better HD camera in the damn first place?

I hate to be captain obvious here, but historical footage strikes me as the #1 reason (historical meaning everything up to yesterday). I mean you can't go back and reshoot the millions of hours of footage the world already has, but there's a lot of high resolution photos of the some of the same subjects.

Secondly, the still resolution on most point and shoot cameras is a lot higher than the video resolution, and probably will be for the foreseeable future. Good software plus a $200 camera seems like a better solution for 99% of people than a $2000 camera which is still going to have problems of its own.

Thirdly, about 4 minutes into the video they start getting into the really interesting stuff about artifact removal (which is still going to be a problem no matter how nice a camera you have) and object removal, which is just well, really cool.

Re:Of the two, I find the Microsoft one to be bett (2, Insightful)

imess (805488) | more than 6 years ago | (#24610767)

Wouldn't it be a far better cost/effort equation to just buy a better HD camera in the damn first place?

Hint: years old amateur/family/etc video meets modern high-res camera.

Re:Of the two, I find the Microsoft one to be bett (2, Insightful)

dword (735428) | more than 6 years ago | (#24611767)

if you're going to go to the effort of videoing a scene

"you..videoing..." isn't the only application. This could be used to enhance other videos. Let's say someone else made a great video (captured some really great scenes, focused on some details) and you want to publish it but even if they focused on cool details, they're not enough. You take a few pictures and enhance their video.
Also, this is just the start. They are currently enhancing static videos but I'm sure in the near future, if this is worked on enough, it could be used to enhance any kind of video scene. So you have something interesting happening - by some people's standards, two squirrels fighting over a nut is interesting enough - but the overall quality of your video is just awful. You won't be able to re-take the shot because the squirrels canceled their contract so you'll take some pictures, match them against the video and voila, high-quality video or a forest and two squirrels kicking each other in the nuts over a nut while filmed by a nut.
Do you by any chance remember those huge radios? I mean those REALLY huge radios weighting about 50kg? They weren't very practical and to the final consumer they were cool but they were heavy and incredibly expensive. Now I carry an MP3 player in my pocket that also has an FM radio integrated, just for the hell of it. === POOF === 20 years later === Do you by any chance remember those projects that they started, to enhance videos of static scenes using photographs? There was an article on a site named "Slashdot" which was taken down after it started WW3... I doubt that you'd remember that, but look where we are today: with a couple of high-quality pictures (100 gpixel ;) you can enhance any video.
This is what this whole project is about. Studying something cool and then enhancing it until it gains practical applications. Why the hell won't /. users stop bitching about "this isn't very useful" and "i don't see the point"? It's not useful now, but it will be, otherwise nobody would invest in it and I think people who pay tons of money for this kind of research know a bit more about what research is good for than you but unfortunately they're too busy making money and changing the world to spend their time on /. The fact that you don't see it's point means only that you don't see it, it doesn't mean there is no point. In stead of saying "this isn't useful" why don't you ask "what could this be used for?" Maybe that change in some people would help us progress faster because they will question the applications of certain research which causes debates which lead the faster progress (not at the time of the debates but a couple of years later people draw conclusions and they start to get along and pretend they never asked dumb questions). It would also encourage researches by showing them that if they give you applications for their work you might embrace it. But NOOOOOOOOOOOOOOOO you just say "i can't do anything with this" which basically insults and shuts the trap of anyone who might give you a couple of uses and it just starts a flame war which is basically a debate focused on insults and swear words in stead of what it should be - focused on pros and cons.

Re:Of the two, I find the Microsoft one to be bett (1)

spoco2 (322835) | more than 6 years ago | (#24638317)

Um... geeze... a little impassioned perhaps?

All I was saying was that the Microsoft tech, which got the small billing, looked to be the more interesting and useful now compared to the other tech which got prime billing.

You know, because Microsoft is 'evil', so doesn't get attention, but a University is wholly 'good' so gets top billing.

And also, you're telling me there aren't a heap of research time spent on ways of doing things that are overly complex and impractical just because the researchers want to do so, when there is, in fact, a far simpler approach?

Yes, some of those ridiculous ways of doing things are done for the learnings involved, but that doesn't mean we all have to ooh and ahh over them while they're in their research mode.

I don't think anyone looked at a gigantic radio back in the day and said 'Man, that's useless, that lets me listen to stuff from thin air'. They didn't say that because that was all there was.

I can look at something like this and feel that it's slightly less useful because it's prime two applications are 'better res and exposure' which are indeed already available using a better camera, and 'removing unwanted things from the video'... which is something I quite detest in the world of digital photography and the like today (let's not show the world as it is, warts and all, let's scrub it until it's the sterile version we hoped it would be). And while it has applications in film making, well, they wouldn't be using the consumer cameras this is all aimed at.

Just chill out, I don't fail to see the smarts in this, don't go ape over anyone who doesn't like a bit of tech you do.

And don't think that the views of anyone on Slashdot represent the 'norm' at all.

If they did, we'd all be using command line only interfaces on linux with no driver support for anything and having to code anything we wanted in a new software product ourselves, because, well, if we want it, 'why don't you fix it yourself then?'

A better use? (4, Interesting)

neokushan (932374) | more than 6 years ago | (#24607983)

All of these techniques are pretty awesome and will certainly be a boon to home video enthusiasts the world over (plus plenty of commercial places that are on a tight budget), but I've got another idea.

You see it on TV all the time, CCTV footage of robberies and the like, couldn't this technology be used to effectively map out a 3D image of the purpetrator?
I know it wont be perfect and most CCTV is probably too low quality to be used, but it would certainly be pretty cool (and useful) to have a vaugely accurate 3D model of the guy, giving you height, build, etc. and with the help of supplementary images, a really easy way to adjust it's appearance.

Re:A better use? (2, Informative)

sirkha (1015441) | more than 6 years ago | (#24608281)

You see it on TV all the time, CCTV footage of robberies and the like, couldn't this technology be used to effectively map out a 3D image of the purpetrator? I know it wont be perfect and most CCTV is probably too low quality to be used, but it would certainly be pretty cool (and useful) to have a vaugely accurate 3D model of the guy, giving you height, build, etc. and with the help of supplementary images, a really easy way to adjust it's appearance.

Yes, like, you could adjust the appearance to look exactly like someone else! Not saying that one would or should do this, but now that they can, they probably will.

Re:A better use? (3, Interesting)

Pingmaster (1049548) | more than 6 years ago | (#24608653)

I would say that mounting a high res still camera in parallel with the CCTV camera and taking, say 1 picture every 10 seconds after the CCTV montion sensors are tripped, which would have quality comparable to a high-end consumer camera (i.e. 7-8 Mpixels), then use that data to enhance the video taken to aid in identifying suspects

That said, I don't think these 'enhanced' videos should be admissible as evidence, since the videos have been effectively tampered with and given the possibility of altering identifying features of a suspect using superimposition of a different picture on the video could either cause the wrong person to be jailed, or the actual criminal set free

Re:A better use? (1)

nametaken (610866) | more than 6 years ago | (#24611485)

No, but if it helps me find the guy with the QuickStop drop bag in his back seat stuffed with bills and a .38 snub-nose, that would help. :)

Re:A better use? (1)

MobyDisk (75490) | more than 6 years ago | (#24613701)

All the photos from cameras have been digitally enhanced. What the camera itself produces is not een viewable by a human. Software either in the camera, or in the PC software in the case of RAW image files, converts the matrix of RGB values into a photo.

While someone could certainly question the accuracy of the enhancement process, there is no good reason enhanced photos could not be admissible as evidence. It would not surprise me to find that it is very common to do simple enhancements anyway since CCTV cameras really tend to suck.

Re:A better use? (1)

jebrew (1101907) | more than 6 years ago | (#24608675)

How about a security camera that has a 10 megapixel still camera shooting at 1 frame every 2 seconds when there's motion.

Take the video at about qvga resolution, map on the high quality still and voila, you've got HD video on the cheap!

Re:A better use? (1)

Steveftoth (78419) | more than 6 years ago | (#24610549)

Then your problem is storage, cause it's a lot of pixels to keep track, which is one reason that the CCTV cameras are so low res, it keeps the bitrate down so they can cheaply store so much footage.

Re:A better use? (1)

neokushan (932374) | more than 6 years ago | (#24612345)

A few years ago that would be entirely valid, but now you can pick up a 1Tb HDD for around $100, storage really isn't as big an issue these days.

Re:A better use? (1)

krayzkrok (889340) | more than 6 years ago | (#24608865)

Bear in mind that the key factor here is the photographs. Whatever you want to "enhance" must be present in those photos as well, so a robbery on CCTV can only be enhanced in this way if simultaneous photos that include the perpetrator are available... in which case why not simply use a high-resolution still security system in the first place (easy: it costs too much to store the massive amounts of data). You also cannot enhance, say, people moving through the scene or changing / unpredictable elements in the scene unless someone else was there simultaneously taking high-resolution photos. Clearly this tech is intended for static scenes, or static objects within those scenes. While this is great tech and certainly has its uses, it also relies on your ability to take well-exposed, clear photographs of the same scene. If your video technical skills aren't up to proper exposure settings, what makes them think your photographic skills are going to compensate for that? I can see this being a plug-in for video editing suites, although knowing what the better ones already cost I can't see it being cheap!

Re:A better use? (0)

Anonymous Coward | more than 6 years ago | (#24610783)

Keep in mind that it's no coincidence that their technique only works with *static* scenes. The camera can move, but not any of the objects. Introduce a human being who's not standing completely motionless (like their statue, like their photo portrait subject), and the whole system falls apart.

Re:A better use? (1)

neokushan (932374) | more than 6 years ago | (#24612337)

Except it doesn't. Check 0ut the Microsoft link at the bottom of the summary (I know, it's a slightly different thing, but it's incredibly similar) and you'll see they do almost exactly what I describe - with moving objects.

Re:A better use? (0)

Anonymous Coward | more than 6 years ago | (#24614247)

I don't think current CCTV systems are compatible with the methods discussed in this article. The algorithm depends on being able to accurately reconstruct the depth of the scene, and they rely on camera movement in order to infer depths. Since CCTV cameras are stationary there are no "easy" depth cues. In order to achieve the same results you'd have to install stereoscopic cameras or something.

Re:A better use? (1)

ichigo 2.0 (900288) | more than 6 years ago | (#24614429)

How about the other way around? Hack the camera, and make yourself invisible to it.

Old News (0)

Anonymous Coward | more than 6 years ago | (#24608101)

This has been posted on Slashdot before.

It's going to be big... (2, Insightful)

bill_kress (99356) | more than 6 years ago | (#24608269)

When the ability to deconstruct a video into a 3-d model & skin (the opposite of what a video card does now) is placed into an open-source API, the possibilities are going to be HUGE (and a little frightening).

Anyone want to post a few ideas? I'll give you a few topics to kick things off:

Change detection (Finding lost objects in a room, seeing boxes left in a government office, where's my remote)
Change observation (plant growth, things that change too gradually for us to notice)
Creating 3-d models from humans (extracted from old films, walking down the street)
Weapon systems (Undetectable lasers blinding targets, Unmanned guns with perfect accuracy)
Home interaction (Make a sign with your hand, computer changes the channel, lighting, heat, ...)
Office monitoring (Exactly where each person is any time just by typing "Where's bill" into your PC)

All things that could be done by any hobbiest/hacker with the right API.

(I assume that to get real-time you could use the massively parallel abilities of a video card, making this stuff run on any hardware...)

Also, just storing models and skins is extremely efficient--You could film a room for years in extremely high resolution and use virtually no storage (almost none except when something or someone new enters the room, then just one new high-def skin)

Other ideas?

Re:It's going to be big... (1)

rhyder128k (1051042) | more than 6 years ago | (#24609009)

Surely, Carmack's got to be interested in this? If he does take to it, I suppose the next gen of hardware renderers would have to be optimised for it. You're right, if it's as good as the demo video makes out, it could be the next big thing.

Re:It's going to be big... (1)

MobyDisk (75490) | more than 6 years ago | (#24613807)

I'm not sure a 3D model is going to help most of these things.

3D models are not necessary to do any of these tasks. Object recognition can be done in 2D, and it is very very hard. I will speculate that doing it in 3D is going to be even harder. Plus, using human brains as a model, we don't do it that way.

--

Change detection - Change detection can be done in 2D, and the person viewing the image can see where the object is anyway. No need to have the 3D model. As for object recognition, that is machine vision and we don't yet have the technology to do it. Maybe a 3D model would help, but I'm not so sure.

Change observation - This can also be done in 2D Same thing.

Weapon systems - I don't think we need 3D models for this either. A laser distance finder and a camera can track an object and hit it dead on. Detecting eyes on a person is a problem that is already solved in 2D anyway, since that is how facial recognition works.

Home interaction - I think hand gestures can already be done in 2D. The real issue with this is that you would need cameras all over the place to do it since you might be obscured.

Office monitoring - This comes back to facial recognition, which is a 2D task. Besides, RFID tags can do this even better. "Computer, locate Commander Data." "Commander Data is not aboard the Enterprise. Oh wait... he was hiding inside a jeffries tube and I couldn't see him. Plus he now has a beard."

Re:It's going to be big... (1)

bill_kress (99356) | more than 6 years ago | (#24617763)

You are exactly right. 2D recognition is extremely difficult--if not impossible. Pretty much like rendering a 2d screen without a 3d model backing it is really difficult. All your points were that doing it in 2d was hard. You were right about every one, doing it in 2d IS hard.

The point of my post was, what if we had a library that took 2 cameras in the real world and changed them into a 3-d model with skins (which is how the article was done). How much easier does it make all these problems. Well, they all become trivial. They go from "NASA" hard to "script kiddie" hard. (Any script kiddie can write a module for quake--that's the difficulty level all these problems would attain.

Reverse-3d has the advantage (over 2-d) that you are comparing 2 pictures, it's more data, but the problem itself of creating an internal 3d model isn't anywhere near as tough as interpreting a 2-d image. Very Processor Intensive, but from a programming point of view, not all that hard (In computers, Hard is mostly how much code do you have to write something, you let the computer take care of the computation unless it requires extreme hardware setups. I think reverse-3d would require some serious hardware, but I'm thinking it can be done on an existing 3d card, or one with some slight design modifications).

To counter just one of your points, Office Monitoring. Since you only store a skin and models, you've got data for where a person is every instance. They are never out of sight--and all you are storing are a wire-frame location (very little storage) and any change to the skin that can't be predicted (Your shirt suddenly turns blue?) Someone showing up somewhere else wearing different clothes wouldn't be possible--you would watch them change, record the difference in the "Skin" and keep following the model. Or if they went into the bathroom where there were no cameras, you have "Skins" and models for everyone currently inside, you do a best match when they come out, differentiate and figure out which one changed. When you find out which one change, you back-flush your data to indicate who that person was.

Two people the exact same height and weight and skin tone and visible birthmarks/tattoos deliberately switching clothes in an unmonitored location would defeat it pretty quickly, obviously, but that wasn't really the point--plus computers are REALLY GOOD at noticing differences once they are stored. If you wanted to compare model movements, you could probably uniquely fingerprint the skins, movements and locations of everyone who entered a grocery store in a day and not fill up a 1TB hard drive (Just storing skins and frame models, remember) and be able to match them up by any parameter, including how they walk (probably as unique as a fingerprint).

Yes, it implies a lot of cameras and a lot of CPU power--but that's a hardware problem, not a "Hard" problem like trying to do all that in 2-d would be.

Re:It's going to be big... (1)

MobyDisk (75490) | more than 6 years ago | (#24621277)

I don't think you even read my reply. Furthermore, I don't think you know anything about what you are talking about.

All your points were that doing it in 2d was hard

I actually said the opposite. All of the things you listed can be done in 2D, except for object recognition. Which cannot be done in 2D or 3D. So having 3D data does not help.

The rest of your reply was just rambling about how more cameras = better. Doesn't address any of the fundamental issues. Meh, why am I bothering to reply.

Re:It's going to be big... (1)

bill_kress (99356) | more than 6 years ago | (#24622847)

Strange, I kinda felt the same way.

Object recognition can be done in 3d much more easily than 2d, THAT is the point. Your saying that it can't doesn't change the fact that that's what the subject of the article was doing (Assuming you read it).

With 3d, a generic identification of objects is very possible and reliable. There are no more questions about what part of the image is part of what object because you have distance information for every surface, the code that looks at the 2 pictures to draw the 3-d internal image should be simple and the process reliable.

The software wouldn't be hard to write, but AMAZINGLY hardware intensive. But then so is rendering a 2-d image from a 3-d model and camera location like we do now. It's the lack of ability to process this stuff in realtime that currently makes all those things impossible, but like 3-d video cards, a solution will present itself. This is one of those problems that lends itself to multi-core processing.

What leads you to believe that this process is as difficult as parsing objects off a 2d image (which is virtually impossible)? Have you been on a failed project that attempted to do 3-d image recognition or something?

Re:It's going to be big... (1)

MobyDisk (75490) | more than 6 years ago | (#24625793)

I think you are both wrong, and I think I now see what you are misunderstanding.

When we think of a 3D model, we imagine that we have a series of coordinates for a head, and then some joint that connects to a neck, and some coordinates for that. Then a torso, with coordinates, etc.

But if we extract 3D images from cameras, we have none of that. Firstly, we have a scene, not a model. We have only points and textures. So first, we don't know what is the person and what is the floor, or the wall, or the table. Assuming we use movement to figure that out, and assuming we can tell one moving object from another and make a separate model, we have new problems.

So we have a series of coordinates. Now, if the original model had 1000 points on the head, we might have 100 for the head in this scene. Or 100,000. It would vary based on position of the person relative to the cameras. And we don't know it is a head. I don't even think anyone has ever made an algorithm to say "this part of this model is a sphere" given the points. Nevermind realizing it is a head.

Then you have joints. Given a 3D model of a person, and a scene of the person bent over sitting down, I don't think anyone has ever even TRIED to correlate those models. I don't see how that is easy.

I think people are imagining a 3D model with all the nice data points, and another 3D model in the exact same position and you just match-up points. But remember the number of points varies, the relative positions of the points varies, and we don't have all the meta-data that a normal 3D model has. We don't know the joints, we don't know which parts are even part of the model we are searching.

In doing a google search, I see that it has been attempted that, given a static model with no joints and all details previously known, it has been attempted to recognize that object in a photo. But that is only a fraction of what must be done to do any of the above things. I don't think the research even generalizes.

http://en.wikipedia.org/wiki/3D_single_object_recognition [wikipedia.org]

This problem is not a rendering issue. It is not an issue of generating a 3D model. It is an issue of matching multiple 3D *different* 3D models against each other. I say different because from one frame to another the model will not be the same. It isn't like I rendered the person into the scene with the model, then the model out again.

Re:It's going to be big... (1)

bill_kress (99356) | more than 6 years ago | (#24646761)

I think you are missing one step. Your arguments are still simply those against a 2-d image. A 3-d image is a completely different problem.

With 2 images of the exact same scene taken from slightly different angles, you have much more information than you have with a single image.

With the two camera angles, you have the data available to calculate the x, y and z coords of each surface you see... You are not simply viewing a flat image as you are with a 2-d camera image.

Given the additional info, you can actually build a 3-d model (The way you described the head/neck, etc).

This process will be somewhat difficult, but with the ability to compare the two pictures of the exact same scene, no longer impossible. In fact, it will be "Generic" meaning that a single software solution will apply to every single pair of adjacent photos.

Therefore my point was that eventually this will be a library. The ability to take 2 images, create a 3-d model of edges for each object in those pictures, strip the skins off and hand the model and skin to an application.

If this can be done in real time, it will change the world (Hence my comment) because with a model and skin, computers can start to really understand the image, not just execute some pre-programmed single-purpose analysis on it like with a 2-d image.

Are you sure you've considered the difference between analyzing a 2-d (single) and 3-d (dual) image, because your description still sounds like you are thinking about breaking down a single image (which is EXACTLY as difficult as you state).

SpaceTime Fusion (0)

Anonymous Coward | more than 6 years ago | (#24608313)

The whole time they were saying SpaceTime Fusion, I kept refraining myself from shouting "CAPT'N, D'SPACE TYM' FOOSH'N RIF' IS GETT'N MOR' INTENS'!" in a Scottish accent.

I miss Scotty :(

Well there goes using video as evidence in court (2, Insightful)

Brynath (522699) | more than 6 years ago | (#24608319)

With most if not all video cameras storing the video digitally, and now with all these new techniques for editing video, why would any court allow for video evidence?

evidence fabrication? (2, Insightful)

Atilla (64444) | more than 6 years ago | (#24608323)

This software, if it actually works as described, could also be used to easily fabricate video "evidence". An average viewer would not be able to tell the difference.

Kinda scary...

Re:evidence fabrication? (1)

TedTschopp (244839) | more than 6 years ago | (#24608957)

I hear that's how the framed the Butcher of Bakersfield... Time to start RUNNING!!

Re:evidence fabrication? (1)

Fri13 (963421) | more than 6 years ago | (#24612535)

Funny, I just watched that movie (The Running Man) yesterday.....

Actually the "Butcher of Bakersfield" part was not manipulated, only a cut and rejoined the voice parts. This technology would be used on the later part where captain freedom "killed" Ben Richards..

Re:evidence fabrication? (1)

MobyDisk (75490) | more than 6 years ago | (#24613839)

I think this would make fabricating evidence much *harder*.

Today, if you want to add a gun into the photo you just have to make it look right from one set of lighting, and one photo, with a limited resolution. If you had multiple cameras generating a 3D model, you must now Photoshop the evidence in so that it looks right from multiple angles, PLUS the software could tell if the gun was shaped differently from different viewpoints. So your pixels must produce a perfect 3D model of the object that is consistent across viewpoints. That's going to be a pain. Now, perhaps a "3D Photoshop" could be used to insert the object into the 3D model, then regenerate the images. But that's more work, and more potential for mistakes.

4-Dimensional DCT (1)

NicknamesAreStupid (1040118) | more than 6 years ago | (#24608459)

I believe IBM did something like this is the 1990s. Obviously not as slick, they didn't have as many CPU cycles.

evidence (1)

drDugan (219551) | more than 6 years ago | (#24608499)

automatic or not - that was a huge eye-opener if that technology is available at the grad-student level. Available in commercial/comsumer products in 3-5 years.

So much for "video evidence". So much for reality. "Your honor, I have a video of what happened and I wasn't there! ... see?"

3d track, projection, basic compositing. (2, Informative)

shidarin'ou (762483) | more than 6 years ago | (#24609809)

This is a 3d track of the shot (which generates a point cloud of 3d points, which can then be used to generate an automatic 3d model of the scene). They then project (a method of texturing that paints a model based on points of projection.. what happens when you stand in front of a projector- you get projected onto) the still photos onto the 3d model, recreating all aspects of the texture and geometry, but instead of SD resolution, you now have gigapixel resolution built into the model.

The reason it looks like a cheap video game is exactly that, they're trying to prove how sharp it is, so instead of anything being anti-aliased etc it's all crisp- which doesn't look like real life.

Solution: get a better video camera, learn how to expose your shots properly.

Oh? And the tree thing? same thing, except instead of projecting the texture on, you just place the texture in the 3d scene where the tree is, and render- it's even easier.

Solution: Don't film a beat up tree. Don't film flowers with a giant sign in the middle of them.

This wasn't at SIGGRAPH this week. As a paper or as a poster- of which there are PLENTY of student posters.

The solution is NOT to fix it in post. The solution is to spend 5 minutes, think it through, and fix it while you're filming.

Re:3d track, projection, basic compositing. (1)

all204 (898409) | more than 6 years ago | (#24610101)

The solution is NOT to fix it in post. The solution is to spend 5 minutes, think it through, and fix it while you're filming.

Thank you. A little OT, but I do location sound for indie films, and that is the most horrible thing a director can say.... "Fix it in post". 5 minutes on set, or 3 days in editing. To me thats an easy choice, but apparently not to everyone.

Re:3d track, projection, basic compositing. (1)

ResidntGeek (772730) | more than 6 years ago | (#24611891)

I don't work in film, so I have a question: how much do the actors and crew get paid per hour compared to an editor?

Re:3d track, projection, basic compositing. (1)

all204 (898409) | more than 6 years ago | (#24615315)

Well... I work for a film coop as a volunteer, and we're all volunteers, so none of us get paid. :p So maybe its different in the pro-circle, I don't really know. But as far as sound goes, if you have bad location sound, you have to rebuild the scene in post, and that usually requires scheduling all the actors in that scene to come back and redub the sound. So even from a money standpoint, I think its expensive, but also in the volunteer circle, people move and post can happen months after the actual shoot so you may never be able to get the original actor back in because he/she moved to a different city or province. I've seen it happen. (I've heard of peoples voices being redone by someone else because they could not schedule the original actor back, and you would have to do it for the entire movie. That usually upsets the original actor...)

Re:3d track, projection, basic compositing. (1)

shidarin'ou (762483) | more than 6 years ago | (#24619843)

It varies wildly. Very, very wildly. Sometimes editors will get paid more, sometimes actors will get paid way more. At the professional film level, they're both unionized positions, unlike effects workers (sad face).

Re:3d track, projection, basic compositing. (1)

blincoln (592401) | more than 6 years ago | (#24616961)

The solution is NOT to fix it in post. The solution is to spend 5 minutes, think it through, and fix it while you're filming.

Obviously everyone should shoot still/video with the intention of it being perfect without postproduction, but sometimes it's impractical or impossible to go back and reshoot something when you find out it's got a problem of some kind.

For example, I went on a drive down the west coast of the US a year ago and took a bunch of pictures. Halfway through the trip, I discovered that sometime around the third day some debris had gotten onto the sensor of my camera. I couldn't go back and retake the pictures, so the only option was to Photoshop it out of every single one that I wanted to use.

For another example, I was just watching the old Five Doctors episode of Doctor Who, and some of the bonus material included unedited footage where the microphone boom and/or the shadow of a crewmember had accidentally appeared in frame. The BBC can't actually go back to 1983, but if they wanted to use that footage they could theoretically use technology like this to fix it up.

Re:3d track, projection, basic compositing. (1)

shidarin'ou (762483) | more than 6 years ago | (#24619807)

Hi Lincoln,

I understand what you mean with unexpected problems that you encounter after the shoot day, but "Fix it in post" is a term most often used on set to avoid working on issues BEFORE or DURING a shoot. For instance, instead of doing prep work before the shoot for a monitor inlay, they'll spend 4 times the money and time to do it in post. Instead of finding a smart way to break glass, they dress some wacko up in green and have him stand between the glass and the camera to break it- green is apparently invisible to the camera, and the camera will just shoot through it.

As a professional at the end of the production chain, I spend an inordinate amount of time fixing laziness that piles up in the chain because someone said: "Fix it in post"

Perhaps I wasn't blatant enough in my original post, but what I was saying (while trying to not be antagonistic) was that none of this is new stuff. There are thousands of little post shops that do this stuff everyday. If they put it in a small package that is able to 3d track standard def footage quickly, accurately and automatically and easily do the resolution adding, area replacement, etc; that's slightly new- but I see very little proof that they've done that.

I apologize if I didn't make that clear enough. To summarize: Is there anything new here? We do this stuff everyday, here's how.

And then I put in a little bitch on people saying "Fix it in post" ;)

Meh... (0)

Anonymous Coward | more than 6 years ago | (#24609865)

It's all still a bunch of digital voodoo-black-magic trickery at work.

You want to impress me? Try planning ahead and shooting the video right the first time.

Claudia, you're mine (1)

gacl (1078259) | more than 6 years ago | (#24609959)

1) Tape myself with cheap whore
2) Combine with picture of Claudia Schiffer
3) Become popular!

Dealing with copyright problems (2, Interesting)

DouglasR (890646) | more than 6 years ago | (#24610091)

This might be helpful to deal with copyright-protected material that gets into frame, for instance, billboards, logos on T-shirts, posters and art-work on walls. Take a single frame into a photo-editor, replace the unauthorised image with an authorised one, and this technique could potentially replace it throughout the sequence. It could equally be used to replace moving images, for instance on a TV screen, with a "blue screen" (or green), that normal video compositing software can then replace with a desired image.

seen this before (1)

skoony (998136) | more than 6 years ago | (#24610221)

photo hunt has been doing this four years. let me know when something new comes up regards, mike

This is like Melodyne for video (1)

jordan314 (1052648) | more than 6 years ago | (#24610803)

This blows my mind almost as much as Melodyne version 2 does: http://www.celemony.com/cms/index.php?id=dna [celemony.com] Only instead of 'direct note access' it's 'direct video object access'. Or something.

Shoot, been doing that for years (1)

leicaman (1260836) | more than 6 years ago | (#24611033)

I give our video people my photos all the time. Photographs (mine are 16.7 megapixel) have much higher resolution. They use FinalCut Pro which allows for seamless inclusion of photos in the workflow. This isn't a new idea. Good to see it becoming more common.

Getting close... (1)

harlows_monkeys (106428) | more than 6 years ago | (#24611699)

This is getting really close to being something that every straight male porn viewer (which means every straight male on Slashdot...) would pay a lot for.

Combine the ability to remove items from a video, like they showed with the lamppost and sign in the flower shop, with the ability to insert new things into the video, and you could turn a boring man on woman porn movie into a lesbian twin incest movie.

Image manipulation and Investigation (1)

mrboyd (1211932) | more than 6 years ago | (#24612115)

It's not exactly related but Al Jazeera just had a piece about a pedophile who got arrested last year after interpol "unwarped" some picture he had put online.
Maybe those new tech might be used to produce that kind of useful result and not only better pops and moms holiday pictures..
Old article: http://www.guardian.co.uk/uk/2007/oct/19/ukcrime.internationalcrime [guardian.co.uk]

More intresting technology than photosynth (1)

Fri13 (963421) | more than 6 years ago | (#24612491)

These two (article) technologies IMHO are more important than Photosynth

http://www.youtube.com/watch?v=556FvXHLtAo [youtube.com]

Epipolar geometry -- nothing new under the sun. (0)

Anonymous Coward | more than 6 years ago | (#24613187)

I made something from the same category as my bachelor thesis. This is same as using refrence shape and creating Virutal studio. But this is beeter using of matchmoving. Because they use informations from the sceen (plus photos). Usage would be great. But I know there is loads of math and coding behind, so it is common that it is not freeware. Aplication like this is not a browser that you can code time after time. This is full time job that can not have a longer breaks. Same possibilities were described in An Invitational to computer vision in epipolar geometry. Idea is quite old but implementations are very rare.

Could this be used for video compression? (1)

harlows_monkeys (106428) | more than 6 years ago | (#24617435)

Could this be used as a form of video compression? Shoot your video at high resolution. Extract a few high resolution stills from the video, and then convert the video to low resolution. Save the low res video and the stills. When you want to play the video, use their algorithm, with the stills taking the role of the photos, to enhance the low res video.

The Paper (0)

Anonymous Coward | more than 6 years ago | (#24619179)

The paper states that the scene is restricted to static scenes, specifically it can only process scenes where the video only contains changes with respect to time dealing with camera motion, specular reflections and other camera-angle dependent effects, or scene lighting. Things like scene geometry changes (filming a moving human being) can possibly be done in the future with a better algorithm for depth via motion, etc.

In these specific conditions, it seems like it would work pretty nicely.

I could see a simple laser or ultrasonic depth finder being integrated into cameras to be used as a "ball-park" estimate of depth for that specific frame, relative to where the pointer is located, of course, possibly increasing accuracy if the algorithm could utilize it.

Re:The Paper (1)

argent (18001) | more than 6 years ago | (#24620597)

The raindrops were not camera-angle dependent effects. I don't know how much time-dependent differences between scenes can be accommodated, but it's not as simple as angle-dependent changes.

On another note, I noticed some ghosting on the "repaired" tree and flowers.

Re:The Paper (0)

Anonymous Coward | more than 6 years ago | (#24621427)

The raindrops were not camera-angle dependent effects. I don't know how much time-dependent differences between scenes can be accommodated, but it's not as simple as angle-dependent changes.

One would think in the granularity of the picture (specifically in the depth estimation algorithm), the raindrops would act more as a "texture" element than a "geometric" element.

Stereoscopy: think synthetic aperture radar (0)

Anonymous Coward | more than 6 years ago | (#24620685)

While the techniques for building a 3d model (used at least internally) from a 2d video shot are undoubtedly impressive, they quite possibly parallel SAR techniques that have been around for decades. Are you listening software patent examiners???

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?