Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Algorithms for Motion Tracking?

Cliff posted more than 12 years ago | from the complex-image-recognition dept.

Graphics 32

Keith Handy asks: "I seem to be unable to find algorithms and/or open source programs that will do accurate motion tracking, i.e. you mark a point on an object in frame 36, and the program can follow that point on that object through all the frames following it. This is useful not just for analyzing motion, but also for interpolating/extrapolating frames of video -- so if you had something at only 15 fps, you could generate inbetween frames (which are not just crossfades between the frames) and actually smooth the effect of the motion. Not something so complicated as to get into actual physics -- just something that will indicate where (in 2D only) that part of the object has moved from one frame to the next, for any given point in the whole picture. And for that matter it doesn't have to be 100% accurate, just any means of generating a reasonable motion-flow map." This doesn't strike me as an easy algorithm to develop, but are there any papers online or offline, that might describe an algorithm that can at least track objects in an image?

"In other words, I want something that does this, in order to write code that will do things like this and this. I already know how to write code to blur and warp images, so to be able to track motion would give me (and you) the same capabilities as these expensive plug-ins.

Anyone know any other resources, directions, or existing code I could look into to find out more about how this works, so I can incorporate it into my own programming instead of paying hundreds or thousands of dollars for limited, proprietary use of the technology?"

Sorry! There are no comments related to the filter you selected.

The Mentally Handicapped have no place in society (-1)

ringbarer (545020) | more than 12 years ago | (#2954507)

They are a waste of time, money and resources, and deserve to be cut out like the cancer they are.

Drooling abominations - a waste of 'equal rights' (-1)

ringbarer (545020) | more than 12 years ago | (#2954519)

The Mentally Handicapped forfeit their place in modern society by virtue of being unable to contribute anything to said society. They clog up public services, are the most common brand of pedophile, and are clearly unfit to be considered civilized.

In the case of "equal opportunites" situations, who would YOU rather deal with in a company? A moon-faced retard, or a real human being?

Book (5, Informative)

rm-r (115254) | more than 12 years ago | (#2954521)

Hmm, not sure about any resources on the net, but I did similar stuff at uni so I can recommend a book. Try Image processing, analysis and machine vision by hlavac et al [] . It's a very good book with plenty of code-neutral algorithms. Good luck.

Re:Book (2)

JabberWokky (19442) | more than 12 years ago | (#2960087)

I'm just tossing in a second for that book - FWIW, I wrote exactly what you're looking for (the intelligent interpolation of frames) in a class that used this book (or the previous edition) as the core text. Good book, good class. If I had the code, I'd toss it to you. Incidently, the frames I used were every other from a sequence of a Star Trek episode (IIRC, it was found via Archie on an MIT ftp site)... which allowed me to go back and compare with the actul frames that I had pulled out and see where the errors were. As a result, I also wrote a few nice vdiff image programs - I'm sure a fm: search will now pull a dozen versions up, but the technique I just described is useful to refine your processing code.

Evan "Tired, and I'm not gonna rewrite that for easier parsing" E.

Video compression (3, Informative)

cperciva (102828) | more than 12 years ago | (#2954528)

I think the best use of this would be in video compression -- if you can recognize the movement of objects between frames, you can encode how much things have moved instead of re-encoding the entire image.

Which is exactly what MPEG does... very crudely. The MPEG solution seems to be to compare a block (8x8?) of pixels with every block in the previous frame.

The fact that MPEG doesn't use anything more sophisticated than this suggests to me that there probably aren't any algorithms which consistently work better.

Re:Video compression (2, Informative)

rm-r (115254) | more than 12 years ago | (#2954549)

Motion tracking is also useful in military software, ie to find that pesky moving Afghan and point a gun at him. Similar technology is also used in some CCTV systems, but apart from that it doesn't seem to have crossed over into the mainstream much. There's a lot of cool software and ideas out there, much of it in the public domain if not GPL'd exactly so I'm sure it's just waiting for someone with a really good idea...

Re:Video compression (2)

rw2 (17419) | more than 12 years ago | (#2955348)

The fact that MPEG doesn't use anything more sophisticated than this suggests to me that there probably aren't any algorithms which consistently work better.

Not that I know, but it could also be that there aren't any algorithms that work better considering the horsepower available in many devices. There could be many algorithms that work much better assuming a dual Athlon 1900+'s to execute it.

Re:Video compression (1)

NightHwk1 (172799) | more than 12 years ago | (#2957421)

i'm sure that we will see something like this in the future, but even right now with our 2ghz machines, point tracking is pretty slow.

download a trial copy of shake [] and try some tracking for yourself. just trying to follow 4 points takes about 1 second per frame, imagine how long it would take to process every pixel (or even 8x8 blocks) in a 30 minute video!

Re:Video compression (0)

Anonymous Coward | more than 12 years ago | (#2957743)

or maybe it's because MPEG is designed to run on extremely minimal embedded hardware, and an algorithm like he's describing wouldn't be easy to optimize in a manner where it would be able to be implemented on a $30 chip. There are considerations beyond 'is it possible' when implementing a standard. Namely, is it possible to do with a cost of development under $X and a cost of deployment under $Y.

Posting anonymously because slashdot broke my user account.

Stalinman has a tantrum - Again! (-1)

ringbarer (545020) | more than 12 years ago | (#2954531)

A surprised and dismayed Richard M Stallman says Gnome project founder Miguel de Icaza owes the community an explanation for comments made to The Register, last week, in which de Icaza advocated basing the project on Microsoft.NET APIs.

"I can't believe it's Gnome you're talking about but if it is, I wouldn't like that," Stallman told an audience at the World Social Forum in Porto Alegre, Brazil last week.

Stallman only learned of de Icaza's intentions to slip the Mono project - based on Microsoft's .NET framework - into Gnome as "the natural technology upgrade" when asked by the audience.

Gnome - the GNU Object Model - is the part of the GNU Project, started by Stallman in 1985.

"I didn't know he was doing that, I find that very hard to believe," he said.

"We would like him to come to the free software community and explain himself to us about it."

Brazilian tech site HotBits has more details here, with a number of other snippets of RMS on globalization, and GNU matters, accessible from the current edition's front page. We're grateful to Reneta Aquino for providing us with a translation.

Outraged Gnome users were mailing us over the weekend vowing to abandon the platform, and GnomeVFS maintainer Ian McKellar (who we inexplicably missed when we called in on Danger the other week) took a swipe at Miguel on the Gnome hackers mailing list: "You don't speak for me and you don't speak for most of the Gnome developers I know". (He also takes a sideswipe at us - we're "usually full of FUD and lies," apparently).

However , Miguel has been entirely consistent. From our own interview at the time of the Mono announcement, to this recent Q&A, he's justified Mono primarily is a better technology infrastructure for Gnome.

So if you didn't see this one coming, you simply haven't been paying attention.

Nor has Miguel made any secret of his ambitions to enrich the software libre desktop with more sophisticated infrastructure, using Microsoft Windows as the model. The Bonobo technology was designed to provide a lightweight compound architecture inspired by The Beast's COM, and there was even a Gnome Basic scripting language mooted at one point.

Miguel has told reporters that only an immigration technicality prevented him from becoming a Microsoft employee four years ago - the small print of the H1-B Visa process disqualifies students who haven't completed their degree course.

Sheep in wolf's cloning
With the community gathering at LinuxToday, to discuss the wisdom of the suggestion, a couple of interesting areas have emerged.

One of the justifications offered for Mono cloning the .NET APIs is that other open source projects do too. Don't WINE and Samba clone the Microsoft protocols or interfaces? Isn't it really all OK? The difference, however, is that Win32 and SMB are dominant standards, and producing a workalike, particularly in the case of Samba, provides an interoperability technology that doesn't entrench the monopoly; Samba is in effect a great big device driver that lets a non-Windows machine access Windows network hardware. .NET is different, in that it the .NET framework has precisely zero users right now, if you discount the more nebulous services such as Hotmail, which have been dragooned into the markitecture.

More worrying for any open source project - particularly one as broad and pervasive as Gnome - is the wisdom of committing to a single vendor's semi-open specifications.

As de Icaza acknowledged last week, "few, very few" of the .NET classes have been submitted to ECMA. And Microsoft has hinted that it would make sure .NET clones pay for using Microsoft technology. How, we'll have to see. It may be worth noting that The Beast typically doesn't view patent infringements in the simple, hand-over-the-money style of a Qualcomm or a Rambus, and is actually more frequently the recipient rather than the initiator of patent infringement lawsuits. But rather, it looks for downstream opportunities it can leverage with business partners.

And in any case, does de Icaza have the personal capital to influence such a decision? Well he might, but in theory it should only go so far. The industry-sponsored GNOME Foundation, has an elected board, which meets fortnightly, and where agenda items such as "8.b. Proposal to sell our souls to The Satan of Redmond in perpetuity" can be postponed until after tea and biscuits.

One of the sponsors of the Foundation is Sun. As we pointed out on Friday, the prospect of selling boxes with the sticker "Solaris - Powered by .NET" might persuade Sun to start taking an active interest in the Foundation. Like, really, really active.

openCV (5, Informative)

bjpirt (251795) | more than 12 years ago | (#2954603)

have a look at openCV (stands for open computer vision), it was originally developed by intel, but was later open souced. runs on both linux and windows and is mainly used for real time motion tracking of live video sources. i'm sure there are some pretty nice algorithms in the source there somewhere. They have their stuff on sourceforge []

and a yahoo groups support forum thing here []

the original intel pages are here []

what about intercorrelation ? (3, Interesting)

dario_moreno (263767) | more than 12 years ago | (#2954612)

Compute the 2D FFT of each frame (in grayscale), then get
the intercorrelation function of two neighbouring
frames. The maxima are more or less where
the objects have moved.

I only used this method on artificially generated
frames, ie 1 frame with translation and noise
added. Still, the intercorrelation sinks quite
fast. On natural images, there must be a lot
of fiddling to do.

Try (1)

cassidyc (167044) | more than 12 years ago | (#2954618)

using a marker of a known colour (e.g. yellow), then read the raw video stream (scan line at a time) looking for the largest instance of your colour (e.g. the longest instance of yellow on a given scan line), and note where you started from on that scan line and the length of the line. From this you can work out the middle of the marker. This should give you the X, Y coords w.r.t. the camera position.


KLT Feature Tracker (5, Informative)

The Whinger (255233) | more than 12 years ago | (#2954624)

Have a look at the KLT tracker - that will probably do what you want.

An implementation can be found here:

Identify Features and Label (1, Interesting)

morbid (4258) | more than 12 years ago | (#2954684)

One way might be to identify "features" in the image e.g. by colour, brightness or changes and build an association tree.

Basically, identify all "peaks" (whatever feature you're interested in) and sort them. Start with the most outstanding feature and associate its nearest neighbours with it. Repeat many times. You will have data structure of references which will produce a map of islands and isthmuses depending on how far down you look.

Attach a "label" (unique ID) to each significant feature in the frame.

Repeat for the next frame.
Compare significant features. Using some sort of threshold, you can attach a confidence level that you're looking at the safe feature in the previous frame.

That's a simplistic overview, but I did it many years ago for looking at the output of stellar formation simulations.

tracking motion (3, Interesting)

Anonymous Coward | more than 12 years ago | (#2954994)

i don't know if this would suit your needs, but a package called "motion" has been available for quite some time which in fact is oriented to tracking frame differences from a video source: []

there are some examples and a sample video which demonstrate tracking "motion."

Re:tracking motion (1)

198348726583297634 (14535) | more than 12 years ago | (#2960607)

Motion.. a .CX site..


related GPL software (1)

cmoss (14324) | more than 12 years ago | (#2955368)

"Motion uses a video4linux device and detects changes in the image. If a change is detected a snapshot will be taken. "

MPEG (2)

markj02 (544487) | more than 12 years ago | (#2955417)

If your goal is merely frame interpolation, I suspect that using a decent MPEG2 encoder and interpolating based on the motion compensation would be good enough.

For other applications (e.g., colorization), you need somewhat better segmentation. Doing this well in the general case is still a research topic; but that's good: you can get lots of research software from around the net that does this sort of thing. Look for keywords like "computer vision", "motion", "segmentation", and "tracking" on Google.

some brainstormed ideas... (2, Interesting)

eizan (138350) | more than 12 years ago | (#2955704)

why don't you try this far-fetched possibility:

break up the iimage into N x N submatrices, and do a fourier transform on each subsection of the image. then do this for the next frame, and calculate the phase differences between each frame, and use linear/cubic/etc interpolation to generate the frames in between. not too difficult, and I think there is even a 2-D FFT library located somwhere on this, however might introduce a couple of artifacts, but if you're doing high framerate video, it shouldn't be too noticeable.

or even more far-fetched:
assuming that the translation of the objects in the image plane between frames are small and uniform enough, you might also be able to pull this off with a properly trained neural network on subsections of the image (so each individual feature fits approximately in each subsection). neural networks can do non-linear regression, but thier outputs are continuous, so I figure if you train it right, it'll give you what you want.

good luck :-)

just use the MPEG algorithm (4, Informative)

dutky (20510) | more than 12 years ago | (#2956145)

Other folk have mentioned the MPEG motion compensation algorithm (though I think they got it a bit wrong). The algorithm chops up the current frame into 8x8 pixel block on even block sized boundries (first block at (0,0):(7,7), second block at (0,8):(7,15), and so on). These blocks are then compared against all possible 8x8 pixel blocks, local to the original block, in the adjacent frame (we compare against the blocks shifted by 1, 2, 3, 4, 5, 6, 7, and 8 pixels in both the x and y directions). Essentially, each block is compared against every possible sub block of a 12x12 pixel block centered on the original block's position. The comparisson succeeds if the difference between the two blocks in small enough (this is a threshold that you set).

Once you have done this for every block in the original frame, you have a set of motion vectors from which you can construct an intermediate frame.

voice from academia (1)

fraggles (556665) | more than 12 years ago | (#2956317)

Check this: []

I think that different MPEG compression schemes track motion differently - some using a brute force method. This method treats your image like a linear function so that it can search for the region of interest in the next image by using a "newton's method" like scheme - Much more efficient than brute force pixel comparison. I could be wrong though - I wasn't really paying attention in class

Decent resources (0)

Anonymous Coward | more than 12 years ago | (#2957681)

I'd suggest searching for articles on the subject in the ACM [] 's digital library. They have a huge quantity of very technical papers, which are likely to at the very least lead you in the right direction. I tried a few searches, but it wasn't obvious to me, as somebody who has never researched that problem, which of the papers was relevant.

Posting anonymously, due to broken slash code.

Possible Hardcopy Resource (1)

Jester998 (156179) | more than 12 years ago | (#2957737)

I picked up a book a while ago titled "Image Analysis and Processing", and it's part of the "Lecture Notes in Computer Science" series... it contains tons of information on image segmentation and has a few sections dealing specifically with object recognition and motion prediction. You could probably adapt many of the processes in there to suit your needs.

I picked up this book (and many other computer and math books) at my local Coles bookstore for $2-$5 CDN$ each... I guess they were trying to get rid of them. I don't know if you'll be able to find a copy, but here's the info anyways:

Lecture Notes in Computer Science
Volume 1310
Image Analysis and Processing
Alberto Del Bimbo (Editor)
Published by Springer

ISSN: 0302-9743
ISBN: 3-540-63507-6

The editor's email address is listed in the cover page:, so you might be able to contact him to see where you could find a copy... Good luck!

Holy Grail... (1)

dsnt02518 (532064) | more than 12 years ago | (#2957775)

... of the CGI industry. Many have tried and many have failed. I've known a couple of people who have been involved in development of systems to do this, and I've seen a lot of companies come and go who have promised such systems at industry shows (quite a few have claimed to be defence funded which is a little scary). None that I know of have borrowed too heavily from public algorithms to do this (although one did claim their system generated some splendid comical morphing effects when mis-applied!). There are quite a few commercial systems out there (have a look at's lists etc), but I guess this isn't what you're after. As I understand the state of the art, simply following a feature on an image frame (using fairly simple algorithms) is not a difficult problem in itself. The tricksy part comes when you need to follow a feature (e.g edge / point) which is changing itself during the sequence, or even in the worst case being eclipsed by another feature (person walking in front of camera). Know it doesn't help, and I hate to be negative, but I don't think you're gonna see a sourceforge motion-tracking project any time soon.

interesting aside (2)

isorox (205688) | more than 12 years ago | (#2959071)

I found the audio commentry on the SG1 dvd for Small Victories facinating, with how they used lasers as points (they later brushed out some), so they could sync the CGI bugs with the moving camera.

Also, the BBC have something camera based in the works tm l

Actually... (1)

Beowulf_Boy (239340) | more than 12 years ago | (#2959437)

I was considering using my webcam as a motion sensor. I do not know if it will work or not, but I was considering having it sound the alert if the .jpg changes by more than so many bytes.
That way, if the structure of the picture changes, with more or less pixels of the same color, the .jpg size will change, therefore meaning something has happened.
Will this work?

Re:Actually... (1)

damiam (409504) | more than 12 years ago | (#2959914)

Yes, it can be done. I have a free-beer Windows program to do that (I don't remember the name, I almost never use it). I dunno if there's anything like it for Linux, but it's certainly possible.

no. Re:Actually... (2)

leuk_he (194174) | more than 12 years ago | (#2960678)

checking the size of a jpg file would not work. maybe an uncompressed jpg file would contain the information. A very small change in the original picture could lead to very much change in the resulting bytes of the jpg file.

but try seaching for webcam motion detector [] on google and you will find some useful stuff.

Re:Actually... (1)

4n0nym0u53 C0w4rd (463592) | more than 12 years ago | (#2964203)

You wouldn't want to do something as crude as looking at the filesize, instead there are some pretty crude techniques for measuring image similarity. One of the simplest, and most common methods for a computer to assess image quality is to calculate the Peak Signal-to-Noise Ratio (PSNR) between the current and previous frame. PSNR is based on a calculation of the Mean Squared Error (MSE) of the luminance differences between equivalent pixels on the old (f) and new images (f').

MSE= (Sigma[f(i,j)-f'(i,j)]^2)/N^2

Where N is the number of pixels. PSNR in decibels (dB) is calculated as

PSNR=20 log10 ((255^2)/MSE)

A higher PSNR between two images indicates a greater degree of similarity, the PSNR of identical images will be infinity. Although this calculation is a useful way of determining overall similarity between images, it does not necessarily correlate with human judgments particularly well. One crucial limitation of PSNR is that it is a global measure that treats all deviations the same. Therefore, a slight, barely perceptible, uniform degradation over the entire image may result in a PSNR that is identical to that of an image with an obvious, severe degradation in a small, prominent location of the picture.

Alternatively, you could look into DCTune, a proprietary algorithm developed at NASA. It is based on some principles of the human visual perceptual system, and gives a score that's more well correlated with human judgments

Motion Estimation (0)

Anonymous Coward | more than 12 years ago | (#2961212)

This is a well known imageprocessing problem, which is (among others) successfully implemented in the Philips 100 hz Natural Motion TV. This TV converts the traditional 50 hz or 60 hz to 100 hz by estimating the true motion in the video sequence and interpolating the missing fields (=half of an interlaced frame) accordingly. This works extremely well, making tv (especially action movies) a lot smoother.

The best technique I know of is called 3DRS, which is patented by Philips. (So it's NOT open source.) One book which explains it really well. (together with lots of other image processing algorithms) is "Video processing for multimedia systems" by G. de Haan

The only reason I know about all this because I recently started working at research into motion estimation ;)

Hope this helps.

VideoOrbits will do this (1)

Eyetapper (204789) | more than 12 years ago | (#2972834)

Our lab is doing very similar work. We've interpolated frames of video from an 8fps image sequence (taken with a wearable computer) into a smooth 30fps video sequence, using VideoOrbits. Theres a short video example available somewhere on my homepage. Perhaps this would be of interest to you. VideoOrbits is freely available at [] .

Video Orbits runs at over 11 fps on
a 700 MHz dual processor machine. Its also a featureless tracking algorithm so no point correspondences need to be identified.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?