Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Which Open Source Video Apps Use SMP Effectively?

kdawson posted about 6 years ago | from the on-the-one-core-on-the-other-core dept.

Software 262

ydrol writes "After building my new Core 2 Quad Q6600 PC, I was ready to unleash video conversion activity the likes of which I had not seen before. However, I was disappointed to discover that a lot of the conversion tools either don't use SMP at all, or don't balance the workload evenly across processors, or require ugly hacks to use SMP (e.g. invoking distributed encoding options). I get the impression that open source projects are a bit slow on the uptake here? Which open source video conversion apps take full native advantage of SMP? (And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)"

cancel ×


Sorry! There are no comments related to the filter you selected.

ffmpeg (5, Informative)

bconway (63464) | about 6 years ago | (#24310803)

Use the -threads switch.

Re:ffmpeg (1)

pak9rabid (1011935) | about 6 years ago | (#24310855)

Agreed. ffmpeg worked quite nicely for me during my DVD-ripping heyday. Although, it seems that it would rip audio and video in separate threads. While an improvement over the traditional, linear way of doing things, I would still see 1 CPU maxed out (video encoding), while the CPU encoding audio was only at about 1/3 capacity.

Re:ffmpeg (5, Informative)

morgan_greywolf (835522) | about 6 years ago | (#24310873)

Similarly, mencoder supports threads=# where # is something between 1 and 8.

Re:ffmpeg (1, Insightful)

Z00L00K (682162) | about 6 years ago | (#24311005)

And it may or may not be useful to actually rune more than one thread per kernel. It depends on the encoder and application how many threads you shall run, so the best is to test with 1, 2 and 4 threads per kernel.

Re:ffmpeg (3, Informative)

sp332 (781207) | about 6 years ago | (#24311169)

And it may or may not be useful to actually rune more than one thread per kernel. It depends on the encoder and application how many threads you shall run, so the best is to test with 1, 2 and 4 threads per kernel.

Isn't that per-core, not per-kernel?

Re:ffmpeg (1)

Z00L00K (682162) | about 6 years ago | (#24311493)

Of course... Not my best day today! Maybe I shall think more of that pillow...

Re:ffmpeg (1)

tzot (834456) | about 6 years ago | (#24311567)

He could mean "threads per kernel task", but I wouldn't fathom how one controls that. In any case, I believe you are right.

Re:ffmpeg (4, Insightful)

Albanach (527650) | about 6 years ago | (#24310949)

Or just convert 2 videos at once, or 4 for a quad core etc. They did suggest they have lots to convert, and it's a pretty easy way to get all available cores working hard.

Re:ffmpeg (1)

sexconker (1179573) | about 6 years ago | (#24311627)

And potentially kill I/O in the process.

Re:ffmpeg (2, Informative)

i.r.id10t (595143) | about 6 years ago | (#24311639)

Yup, with separate disks to work on to remove (mostly) the disk i/o contention, just let each process run happily away.

Re:ffmpeg (1)

init100 (915886) | about 6 years ago | (#24311671)

That's exactly what I do. I also wrote a scheduler in Python that starts new jobs when the previous ones are completed. It keeps the number of running encoding processes equal to the number of processors/cores.

To get the optimal scheduling order, it figures out the length of each input file (using midentify from the mplayer/mencoder distribution), and then sorts the jobs so that the longest jobs will be processed first (it assumes that processing time is roughly proportional to input file length (in seconds, not bytes)). This minimizes the time when one or more cores will be kept idle by remaining jobs. After all jobs are finished, it optionally powers the system down, which is nice when you're running jobs at night.

Re:ffmpeg (5, Interesting)

Tanktalus (794810) | about 6 years ago | (#24311977)

That sounds like a lot of work... I just used make:

%.mpg: %.avi
tovid -ntsc -dvd -noask -ffmpeg -in "$<" -out "$(basename $@)"

all: $(subst .avi,.mpg,$(wildcard */*.avi))

Then I just ran "make -j4". All four processors working like mad, with a minimal of effort.

(You may need to change the wildcard for your own scenario.)

Re:ffmpeg (-1, Troll)

neumayr (819083) | about 6 years ago | (#24312039)

Not posting a link doesn't make your post an any less shameless plug, just a more pointless one.

Re:ffmpeg (-1, Flamebait)

Anonymous Coward | about 6 years ago | (#24311057)

FCP's Compressor uses all my cores effectively it looks like. Oh, wait, you said PC

Re:ffmpeg (5, Informative)

mweather (1089505) | about 6 years ago | (#24311211)

Apple computers ARE PCs. They coined the damn term.

Re:ffmpeg (2, Interesting)

civilizedINTENSITY (45686) | about 6 years ago | (#24311929)

strange that quoting history correctly and in context gets you modded flamebait...

Re:ffmpeg (0, Troll)

sexconker (1179573) | about 6 years ago | (#24311667)

Too bad Final Cut Pro is trash.

Any of the various free and extensible encoders / tools are infinitely better than FCP for video conversion.

Editing video with lame effects and such is another story, since the free (open) shit tends to not have GUIs worth a damn. But what do you expect? It's all geared at converting commercial stuff for piracy.

Re:ffmpeg (1, Interesting)

fm6 (162816) | about 6 years ago | (#24311127)

So why is threading off by default? In a CPU-intensive application like this, multithreading always makes sense, even on a single-core system.

Re:ffmpeg (4, Informative)

m0rph3us0 (549631) | about 6 years ago | (#24311515)

No it doesn't the only time you want to use multi-threading in a single CPU environment is because asynchronous methods for IO are unavailable or the code would be too difficult to re-architect to use asynchronous IO. If the application is seriously IO bound threads can even make the situation worse by causing random IO patterns.

Ideally, the number of threads a program uses should be no more than the number of processors available. Otherwise, you are wasting time context switching instead of processing.

Re:ffmpeg (0)

kesuki (321456) | about 6 years ago | (#24311649)

then the most logical way to do things is to count the number of cores, and do threads -1 from that total.

always leave 1 core free... by default. with quad cores out and 8 cores promised, and no sign of things changing, it's time to rethink defaults.

oh hey, any open source program that supports multiple instances doesn't even need to thread, just run x copies, if it's a batch encode/transcode process... but threading is useful for n-pass encoding.

Re:ffmpeg (4, Insightful)

m0rph3us0 (549631) | about 6 years ago | (#24311757)

On a two processor system this would result in multi-threading being off.

Re:ffmpeg (1)

obstalesgone (1231810) | about 6 years ago | (#24311525)

Threading has an overhead, making it a waste of resources if you don't have multiple cores. Multi-threading on a single core is always slower than single-threading. There is a common misconception about this because multi-threaded applications can feel more responsive, but in fact, they take longer to accomplish the same unit of work.

Re:ffmpeg (1)

Macman408 (1308925) | about 6 years ago | (#24312005)

In a CPU-intensive application like this, multithreading always makes sense, even on a single-core system.

No, it doesn't. Performing a task in multiple threads always has some amount of communication overhead. Depending on the type of task being performed and the algorithm being used, that overhead can vary quite a bit. In any case, a multithreaded app will do at least a little bit more work, and in some of the worse cases, it might have a lot of conflicts over shared data, causing a significant slowdown. I'd expect to see anywhere from a couple percent performance hit all the way up to 50% less than an ideal speedup for a particularly bad application. The benefit of multithreading comes when you can be running multiple threads at once - so a 2-threaded app on one CPU might run at 0.95 times the speed of the same app when single-threaded... But, run it on two cores, and it runs at 1.9 times the speed of the single-threaded version. (waves hands, ignoring lots of variables)

Of course, there are a lot of variables - like is the single-threaded version actually doing the same thing? It might skip all the communication steps or mutex locking that a multithreaded application would do, or it might be doing them anyway. And the OS's scheduler and cache have an effect, too. If you have multiple threads fighting over one CPU (and its cache), they can slow each other down if the cache isn't large enough to hold the working set of both threads at the same time. The TLB can also suffer in the same way.

Moral of the story: it's an extremely rare application that will get a speedup on a single-core CPU (without simultaneous multithreading - hyperthreading is a lot like adding a second core).

Re:ffmpeg (4, Informative)

ydrol (626558) | about 6 years ago | (#24311931)

Darn, I forgot a minor detail in my question. I was really asking about the various front-end apps (dvd::rip, k9copy, acidrip etc), I got the impression that none seem to notice they are running on an SMP platform and pass the necessary switches by default to the backend.

Some may argue this is a good thing, but for the time being SMP is the way forward for faster processing as MHz has maxed out, in consumer PCS. So when they start buying octo-core CPUs they dont expect it to run at 1/8th speed by default.

I was also being a bit lazy. I could have checked up on each app in turn, but I asked /. instead.

transcode, of course! (5, Informative)

morgan_greywolf (835522) | about 6 years ago | (#24310813)

transocde [] uses separate processes for everything.

Simple... (0)

Anonymous Coward | about 6 years ago | (#24310859)

You have to design it as SMP from the ground up, you cannot just hack in it later. Not to forget that multi-threated programming is hard. Give it a few years and there will be more OSS solutions. Multi core processors are not mainstream for that long.

Re:Simple... (1, Insightful)

Cyrano de Maniac (60961) | about 6 years ago | (#24311035)

I'm still not sure where this idea that "multi-threaded programming is hard" comes from. It's not. It seems that most people are just afraid of it because they're not familiar with it.

Or perhaps I just overestimate the mental capacity of most programmers? Having looked at a lot of code, there may be merit to that theory.

Re:Simple... (5, Informative)

j00r0m4nc3r (959816) | about 6 years ago | (#24311199)

Running multiple instances of the same code concurrently in multiple threads is simple. Even running mutually exclusive parts of the same code concurrently in separate threads is easy. Converting complex serial algorithms to effectively utilize multiple cores is generally not simple. And writing code that can scale and balance across n number of cores/threads is extremely hard. There are all sorts of synchronization issues to deal with, scheduling issues, data transport issues, etc.. and it becomes increasingly hard to debug code the more cores/threads you throw in. I think the stigma is justified.

Re:Simple... (0, Insightful)

Anonymous Coward | about 6 years ago | (#24311359)

And writing code that can scale and balance across n number of cores/threads is extremely hard.

You're overgeneralizing. Sometimes it's hard, and sometimes it's dirt simple easy.

Re:Simple... (2, Informative)

sexconker (1179573) | about 6 years ago | (#24311737)

How the hell is this modded interesting (as opposed to informative)?

Do people really not know this stuff (thus making it interesting to them)?

For the gp and the others who still don't get it.

Multi-threaded programming (getting your shit to run in separate threads) is easy, now.
Multi-threaded / distributed algorithms (getting your shit to do some coherent, useful shit while scaling well) are not easy at all.

Re:Simple... (1, Insightful)

everphilski (877346) | about 6 years ago | (#24311233)


If you truly understand the problem domain you are operating in, parallelism becomes readily apparent. Implementing it isn't difficult even on old code, again, if you truly understand where the parallelism exists.

Re:Simple... (1)

Bert64 (520050) | about 6 years ago | (#24311517)

Multi core no....
But unix apps have been running on multi processor systems for years, and geeks have had access to such systems for years too. I did video encoding in 2000 on a quad cpu alphaserver and a dual cpu sparc, but i just did as someone else suggested and ran multiple encodes simultaneously.

Re:Simple... (1)

brokenin2 (103006) | about 6 years ago | (#24312103)

Yep.. you understood your problem domain, and easily recognized where parallelism existed. Then you stated your solution like a practical intelligent person, not like some moron trying to claim that everything is always simple because he is so damn smart that he transcodes all his videos using a neural interface to his own brain while he sleeps. It was simple you know, because brains are massively parallel, and can kick the shit out of your PC when it comes to overall processing power.

x264 (3, Insightful)

Anonymous Coward | about 6 years ago | (#24310869)

x264 use slices and scales pretty well across multiple cores. I use it on windows via megui, but you could easily use it in Linux as well. You could use mencoder to pipe out raw video to a fifo and use x264 to do the actual conversion, for instance.

Beat me to it! (4, Informative)

BLKMGK (34057) | about 6 years ago | (#24310985)

x264 via meGUI from Doom9 is what I use to compress HD-DVD and BD movies - also on a quad core. I have some tutorials posted out and about on how I'm doing it. Near as I can tell you cannot dupe the process on Linux due to the crypto - Slysoft's AnyDVD-HD is needed.

Playback - I use XBMC for Linux. It is also SMP enabled using the ffmpeg cabac patch. the developers of this project have been VERY aggressive at taking cutting edge improvements to the likes of ffmpeg and incorporating them into the code. Since Linux has no video acceleration of H.264 SMP really helps on high bitrate video!

Re:x264 (0, Offtopic)

kesuki (321456) | about 6 years ago | (#24311885)

now if only h264 didn't use atrocious, buggy, awful non-burned in subtitles that don't render correctly if you're missing the 'fonts' (especially the foreign language fonts!) the encoder assumed everyone in the world has!

there is a reason the non-burned in subtitles in DVDs are so atrocious looking it's so that EVERY DVD player could do subtitles right.

I've even had non-burned in subtitles CRASH VLC media player!!! WTF do they only think of windows media player version 20.9.1029.3! or whatever it is they use?

if a file is available in avi and MKV format i always go with AVI because the subtitles look so much better! they don't crowd each other, they don't 'grow larger that the screen border' when you maximize the screen, they don't 'stay so long you can't read the next piece of text'

ugh i hate mkvs you know they could just burn in the subtitles, but because mkv is a container they don't bother.. it's like slacking on the subtitle quality control, they don't even need to preview how it works, cause it's not like people are going to say 'make a version 2 so i can see all the subtitles right!'

Re:x264 (1)

TheDreadedGMan (1122791) | about 6 years ago | (#24312007)

what does this have to do with SMP video apps... plus, complain, but also you should report bugs with subtitles...

Burned in Subtitles are great if you want to re-encode each language of the movie separately.

If the movie player worked properly then it would look fine...
Also, fonts should be selectable in the movie player, not locked to the subtitle file...
WMP is v11 not v20... and AFAIK doesn't do subtitles out of the box, anyone?

Simple question.. (0)

Anonymous Coward | about 6 years ago | (#24310871)

simple answer:

VisualHub... (3, Informative)

e4g4 (533831) | about 6 years ago | (#24310881)

...makes excellent use of multiple cores. It is however Mac-only. Interestingly, what it does is split a file into chunks and spawns multiple ffmpeg processes to do the conversion. Which is to say, perhaps you can do some (relatively simple) scripting with ffmpeg that will do the job.

Which part of Open Source didn't you get? (-1)

dreamchaser (49529) | about 6 years ago | (#24310993)

OP is asking for open source tools. You cited a commercial one that doesn't provide source.

Re:Which part of Open Source didn't you get? (0)

Anonymous Coward | about 6 years ago | (#24311095)

Beggars can't always be choosers, but you can always be a prick.

Re:Which part of Open Source didn't you get? (2, Informative)

phuul (997836) | about 6 years ago | (#24311131)

So is ffmpeg not open source? It uses the LGPL license and from their license FAQ:

"FFmpeg is licensed under the GNU Lesser General Public License (LGPL). However, FFmpeg incorporates several optional modules that are covered by the GNU General Public License (GPL), notably libpostproc and libswscale. If those parts get used the GPL applies to all of FFmpeg. Read the license texts to learn how this affects programs built on top of FFmpeg or reusing FFmpeg. You may also wish to have a look at the GPL FAQ. "

Since his suggestion was to do some scripting that does essentially what VisualHub does using ffmpeg I'm not sure I see how he missed the Open Source requirement.

Re:Which part of Open Source didn't you get? (4, Informative)

pushing-robot (1037830) | about 6 years ago | (#24311193)

OP is asking for open source tools. You cited a commercial one that doesn't provide source.

VisualHub (the front-end app) may be closed, but ffmpeg is LGPL.

And the GP was suggesting using ffmpeg, not VisualHub.

Re:Which part of Open Source didn't you get? (1)

mweather (1089505) | about 6 years ago | (#24311239)

And told him how it uses an open source program in an easily-replicatable way.

Re:VisualHub... (0)

Anonymous Coward | about 6 years ago | (#24311203)

If you are going to use platform specific you could just use Compressor in FCP. It will use your CPU power from other computers too after installing Qmaster (comes with FCP) on them. I often turn on my MacBook Pro and include it in the job and it shaves off ~40% of the time transcoding my video by using two computers (6 cores total). The CPU monitor bars on both machines shows all cores busy.

Re:VisualHub... (0)

Anonymous Coward | about 6 years ago | (#24311907)

Visualhub does NOT do this, it spawns a single ffmpeg process but with many threads, it doesn't split a file in to chunks. I think what your thinking of is when you do a h.264 conversion - what it does here is make good use of x264 and all of its parallisation.

P.S. I currently alpha test for Techspanion

x264 and avisynth (2, Informative)

PhrostyMcByte (589271) | about 6 years ago | (#24310913)

x264 and avisynth can make pretty decent use of threads. check out meGUI.

Re:x264 and avisynth (1)

figleaf (672550) | about 6 years ago | (#24311097)

Yeah x264 is great. There is a slight quality degradation (albeit you have to look really hard to visually determine the difference) if you use multiple threads.
I once used a batch file to encode several gigs of my family vacation MJPEG videos to H.264 using x264 in a single background thread over a period of 10 days.
With some heavy-duty post processing (for noise removal etc) it encoded about a 1 GB source/day. There was no perf. degradation with my other apps (games, email etc.) on account of the video encode.

Re:x264 and avisynth (1)

Elbart (1233584) | about 6 years ago | (#24311107)

LOL meGUI. An encoder-GUI which needs admin-rights on Vista. No comment.

Re:x264 and avisynth (1)

figleaf (672550) | about 6 years ago | (#24311171)

Thats not correct. The admin-rights are only needed to update Megui. Video encode works fine without admin permissions.
You can install MeGUI in a non-standard location like c:\tools\megui and not require admin permissions to update.

Re:x264 and avisynth (1)

Henriok (6762) | about 6 years ago | (#24311115)

ffmpegX for OSX uses x264 and it's transcoding like mad on my eight core Mac Pro. A 2h Video_TS film conversion to iPhone-ready double pass h264/MPEG4.. in less 20 minutes. Using 720-760% CPU, i.e. just the right ammount for me that uses the machine for other tasks as well.

Re:x264 and avisynth (0)

Anonymous Coward | about 6 years ago | (#24311611)

ffmpegx is not opensource but shareware

Re:x264 and avisynth (1)

Henriok (6762) | about 6 years ago | (#24312181)

It's only the GUI that's shareware, what I just told everyone was that the open source codec x264 is threaded and performing very good on SMP systems.

Re:x264 and avisynth (1)

Parag2k3 (1136791) | about 6 years ago | (#24311153)

AviSynth is single threaded, so complicated avs's won't effectively use all possible threads.

Re:x264 and avisynth (1)

PhrostyMcByte (589271) | about 6 years ago | (#24311541)

The newer version supports SetMTMode [] which works quite well in many cases.

Load balancing: Why? (4, Insightful)

DigitAl56K (805623) | about 6 years ago | (#24311043)

don't balance the workload evenly across processors

Why is balancing the load evenly important, as long as one thread is not bottlenecking the others? Loading a particular core or set of cores might even be beneficial depending on the cache implementation, especially when other applications are also contending for CPU time.

Sure, a nice even load distribution might be an indicator for good design, but it doesn't have to apply in every case. I don't think software should be designed so you can be pleased with the aesthetics of the charts in task manager.

Re:Load balancing: Why? (2, Insightful)

Scottie-Z (30248) | about 6 years ago | (#24311145)

Because, ideally, all four cores should be running at 100% -- the idea is to make maximal use of your available resources, right?

Re:Load balancing: Why? (4, Insightful)

DigitAl56K (805623) | about 6 years ago | (#24311347)

It's still possible to load all cores 100%.

A video decoder that I'm working with, for example, currently uses only as many threads as necessary for real-time playback. So for example if one core can do the job only one core is used. If the decoder looks like it might start falling behind more threads are given work to do. Ultimately, if your system is failing to keep up all cores will be fully leveraged.

However, so long as only some cores are required the others are 100% available to other processes, including their cache (if it's independent). I'm not sure how power management is implemented but perhaps it's even possible for the unused cores to do power saving, leading to longer batter life for laptops/notebooks, etc.

the idea is to make maximal use of your available resources, right?

No, the idea is to make the best use of your resources. I'm not trying to say that load balancing is wrong. I'm just saying that processes that don't appear to be balanced are not necessarily poorly designed or operating incorrectly.

Re:Load balancing: Why? (0)

Anonymous Coward | about 6 years ago | (#24311383)

Yes, the idea is to maximize the usage of your resources. But as the parent just said

Loading a particular core or set of cores might even be beneficial depending on the cache implementation, especially when other applications are also contending for CPU time.

He wrote a whole sentence just to tell you why you wouldn't need to post what you just posted, but you didn't read it.

Re:Load balancing: Why? (0)

Anonymous Coward | about 6 years ago | (#24311475)

What an absolutely ridiculous question. Balancing the load isn't an exercise in aesthetics. It's about minimizing the run-time. If you give three of your CPU's 10% of the work and the other one gets 70% then your task going to take 70% of the time it took when running serially. That's four spanking new CPU's that don't even do the job tiwce as fast as the original.

If, on the other hand, you balance everything nicely, your task will take 25% of the time. That 4x speed up is guaranteed to make you feel a lot better about the mullah you just parted with. Of course, when you realize that there's more to life than ripping video and that you could do it all in batch at night anyway, you'll no doubt suffer a little buyer's remorse, but that's another story.

Re:Load balancing: Why? (0)

Anonymous Coward | about 6 years ago | (#24312017)

Why? Heat! Balancing load is balancing heat.

I know this is wrong to say (0, Flamebait)

Anonymous Coward | about 6 years ago | (#24311125)

But I actually really hope the person who "asked Slashdot" this dies in a fire. Honestly. Is Google THAT broken these days?

Yes, this is a troll. Mark it as such and feel the peace. I don't mean to "troll" as such, I don't care for replies. I just reiterate my first sentence. Fire, die in one. Use your God/creation given brain next time.

Handbrake (5, Informative)

vfs (220730) | about 6 years ago | (#24311163)

Handbrake [] has always used both of the cores on my system for transcoding.

Re:Handbrake (1, Informative)

Anonymous Coward | about 6 years ago | (#24311391)

Handbrake has always used both of the cores on my system for transcoding.

... and is only good for transcoding DVDs. Sure it's nice and simple for that one thing, but I assume the submitter wants more than that.

Re:Handbrake (4, Informative)

catmistake (814204) | about 6 years ago | (#24311647)

that's because Handbrake uses ffmpeg

Use Mac OS X... (-1, Offtopic)

Anonymous Coward | about 6 years ago | (#24311191)

Yeah, I realize this is somewhat of a troll comment in since the poster already said he bought PC hardware...

But Mac users have been living with SMP since 2001 (in the early 2000's Apple began shipping most their professional desktops with multiple processors). As such, almost all Mac multimedia and conversion tools are multi-threaded.

I'd assume Linux would have pretty good SMP support too due to the wide range of hardware it targets, although most consumer x86 machines up until a few years ago where single core. Us Mac users view Windows as sort of a toy in regards to SMP. It's kind of funny to see them just adding SMP support to a lot of software within the next few years, whereas on the Mac it's been essential to code for SMP up until the G5 because the G4 processors where slower by themselves.

Re:Use Mac OS X... (0)

Anonymous Coward | about 6 years ago | (#24311623)

And surprisingly, this is not something that we can blame on BillG. XP has supported multiple processors from the beginning. Multi processor motherboards were just too expensive for the average consumer. It wasn't until the P4 Xtreme that multi core became a reasonably priced option. For once, Microsoft was actually ahead of the curve in providing support for a technology BEFORE the market really needed it.

F(next) = F(current) + Delta(F(current:next)) (5, Insightful)

Lumenary7204 (706407) | about 6 years ago | (#24311195)

The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.

Multi-threading is most efficient when it is applied to discrete data sets that have little or no dependency on each other.

For example, suppose I have a table with four columns -- three holding input values (A, B, and C) and one holding an output value (X). If the data in a given row of the table has nothing to do with the data in any other row, multi-threading works efficiently, because none of the threads are waiting for data from any of the other threads. If I want to process multiple rows at once, I simply spawn additional threads.

On the other hand, for data such as MPEG video, the composition of the next frame is equal to the composition of the current frame, plus some delta transformation - the changed pixels.

This introduces a dependency which precludes efficient multi-threaded processing, because each succeeding frame depends on the output of the calculations used to generate the prior frame. Even if more than one core is dedicated to processing the video stream, one core would wind up waiting on another, because the output from the first core would be used as the input to the second.

Re:F(next) = F(current) + Delta(F(current:next)) (2, Informative)

Lumenary7204 (706407) | about 6 years ago | (#24311245)

Note that the above example is about the video component only of a single MPEG audio/video stream.

There is no reason that an encoder/decoder can't process audio in one thread and video in another, thereby using more than one core (which has already been discussed in other posts relating to this article).

keyframes (5, Informative)

Anonymous Coward | about 6 years ago | (#24311297)

Actually, the MPEG stream resets itself every n frames or so (n is often a number like 8, but can vary depending on the video content). These are called keyframes (K) and the delta frames (called P and I frames) are generated against them. Because of this, it is really easy to apply parallel processing to video encoding.

Re:keyframes (0)

Anonymous Coward | about 6 years ago | (#24311467)

Your HDV camera's MPEG output will be a long GOP with the key frame every 15. And if you are going to be doing fancy editing effects, it's often best just to get out of the GOP format and edit with a keyframe codec such as ProRes 422.

Re:keyframes (4, Informative)

DigitAl56K (805623) | about 6 years ago | (#24311839)

Actually, the MPEG stream resets itself every n frames or so (n is often a number like 8, but can vary depending on the video content).

That is not true for MPEG-4 unless you have specifically constrained the I/IDR interval to an extremely short interval, and doing so severely impacts the efficiency of the encoder because I-frames are extremely expensive compared to other types.

Keyframes are usually inserted when temporal prediction fails for some percentage of blocks, or using some RD evaluation based on the cost of encoding the frame. Therefore unless the encoder has reached the maximum key interval the I frame position requires that motion estimation is performed, and thus you can't know in advance where to start a new GOP.

In H.264 due to multiple references you would certainly have issues to contend with since long references might cross I-frame boundaries, which is why there is the distinction of "IDR" frames, and this would certainly not be possible threading at keyframe level.

Granted, for MPEG1&2 encoders threading at keyframes is a possibility, although still not one I'd personally favor.

Re:keyframes (1)

statemachine (840641) | about 6 years ago | (#24312121)

How did you get a +5 Informative when you're wrong?

First off, which MPEG spec has a K-frame? An I-frame is not a delta frame, it's more like your "keyframe." P and B are the delta frames.

Secondly, there's very little to parallelize if you're working with open Groups of Pictures (GOP), that is to say every GOP references into the next GOP. If you have closed GOPs, then you can do this a little better by putting the next GOP on another core/CPU.

But will you gain a significant speedup? The problem is not just chugging away on code. It's all the data that needs to fly around. Your core will be IO bound while your data cache and bus gets hammered.

You'll find more benefits from encoding shortcuts than you will by simply flinging another core at it.

Re:F(next) = F(current) + Delta(F(current:next)) (4, Insightful)

Omega996 (106762) | about 6 years ago | (#24311367)

theoretically, couldn't an encoder scan the data stream for keyframes, chunk the data from keyframe to the next keyframe, and then queue up the keyframe+delta information for multiple cores? That way, each core has something to do that isn't dependent upon the completion of something else.
i'd think that n-1 cores/threads/whatever to process the chunked data, and the last core/thread/whatever to handle overhead and i/o scheduling would run pretty nicely on a multi-core machine.

Re:F(next) = F(current) + Delta(F(current:next)) (0)

Anonymous Coward | about 6 years ago | (#24311871)

I would think that each GOP could be worked on by separate cores. Additionally, macroblocks in a frame can also be I, P, or'd think an encoder could have cores work on their own macroblocks as well.

Re:F(next) = F(current) + Delta(F(current:next)) (0)

Anonymous Coward | about 6 years ago | (#24311373)

Learn about I-, P- and B-Frames before you write long-winded bologna about stuff you clearly don't understand.

Re:F(next) = F(current) + Delta(F(current:next)) (1)

ZachPruckowski (918562) | about 6 years ago | (#24311387)

MPEG uses keyframes, right? So you'll still have a full frame in there every few frames. When I play back a MP4 I encoded, I wind up with something like a full frame every second or two (with the intermediate frames being the transformations you mentioned). So you can split at those frames. That's not infinitely parallel, but if we split it up by minute-sized segments, we'd have 90-150 segments (based on movie length), which is plenty for any prosumer computer for the foreseeable future, and even plenty for smaller clusters (that's 30 quad-cores or so).

Re:F(next) = F(current) + Delta(F(current:next)) (2, Informative)

Zygfryd (856098) | about 6 years ago | (#24311413) []

You can encode GOPs independently. I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.

Re:F(next) = F(current) + Delta(F(current:next)) (2, Insightful)

init100 (915886) | about 6 years ago | (#24311827)

I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.

Is this even needed if you use multi-pass encoding? At least for XviD, IIRC the first pass is used to accumulate statistics used to allocate the proper bit budget to each frame. Then the individual processes should be able to use the statistics file from the first pass to get the bit allocation for their current GOP in the second pass.

Re:F(next) = F(current) + Delta(F(current:next)) (4, Insightful)

John Betonschaar (178617) | about 6 years ago | (#24311419)

You could of course split each frame in slices, and process these in parallel. Or skip the video N frames between each core, with N being the number of frames between MPEG keyframes. Or have core 1 do the luma and core 2 and 3 the chroma channels. Or pipeline the whole thing and have core 1 do the DCT, core 2 the dequant etc. and have core 3 reconstruct the output reference frame while core 1 already starts the next frame.

Plenty of ways to parallelize decoding, and even more for encoding...

Re:F(next) = F(current) + Delta(F(current:next)) (0)

Anonymous Coward | about 6 years ago | (#24311463)

Doesn't MPEG have key frames? Surely each core could grind out work units of N delta frames, starting from different key frames?

Re:F(next) = F(current) + Delta(F(current:next)) (1)

tjugo (453378) | about 6 years ago | (#24311565)

Your explanation is not accurate.

Most video compression techniques including MPEG set a maximum number frames between base frames. A base frames can be decoded without any information about previous or future frames.

All the motion vectors or deltas are calculated against the closest previous base frame. Theoretically you can parallelize the decoding into the total number of base frames your video stream has. If you are decoding a 60 minute video encoded using a base frame every 1s you can split the job into 3600 independent tasks.

Video decoding in nature is well suited for multi-threads systems.

Re:F(next) = F(current) + Delta(F(current:next)) (1)

ubergeek65536 (862868) | about 6 years ago | (#24311597)

You're just plain wrong. There are lots of ways to fully use SMP when either decoding or encoding MPEG streams. Not only are frames grouped starting with a jpg like keyframe using something called a GOP each frame is constructed of blocks which are usually 16 pix square of which each block can be processed on a different thread. All the audio streams and video streams can be processed by multiple threads too.

Re:F(next) = F(current) + Delta(F(current:next)) (0)

Anonymous Coward | about 6 years ago | (#24311661)

you totally missed what he wanted to do. He wanted video conversion. This means generally 1-N decode and 1-N encode processes. Multithreading is well suited to that, since you can very easily split the two.

Most apps however do not do it this way, as the author pointed out, because instead of having a managed buffer that is shared its far easier to just have 1 function that basically does a decodeSrc(); encodeDst(); and loop. It ends up being inexperience or laziness that results in this.

Re:F(next) = F(current) + Delta(F(current:next)) (1)

liusu119 (1086655) | about 6 years ago | (#24311685)

The dependency only holds with in 1 segment between keyframes during decoding. For encoding, there is no such thing as F(current), the encoder should know all of F. Even though current MPEG encoder implementation may not be like this, but for static transcoding, the delta of all frames could be computed with no dependency to each other. So I don't think it's a limitation on MPEG itself. It's a problem of how frames are served and how the encoder takes frames.

Re:F(next) = F(current) + Delta(F(current:next)) (1)

semiotec (948062) | about 6 years ago | (#24311707)

The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.

Not quite true. Someone above already explained some of this re VisualHub.

The video data/frame at 0:00 is very likely completely unrelated to the data/frame at 5:00, thus you can simply chop up the raw file into a number of segments and process them in parallel.

Some clever stitching is probably required to put the whole thing back together in the end.

Multi-threading is most efficient when it is applied to discrete data sets that have little or no dependency on each other.

Exactly, so you chop up the raw input into segments and they become discrete data sets.

Re:F(next) = F(current) + Delta(F(current:next)) (0)

Anonymous Coward | about 6 years ago | (#24311893)

Your explanation is overly simplistic.

If you chunk by all the frames between the I frames, each one of those chunks can be encoded on different core. Depending on the stream format, the GOP (group of pictures) is mandated. So it is fairly straight forward to parallelize encoding.

Max CPU? (1)

HaeMaker (221642) | about 6 years ago | (#24311207)

Huh? I am using AGK and my CPU never does anything. It is always waiting for I/O. I must be doing something wrong...

Re:Max CPU? (1)

reikoshea (1160155) | about 6 years ago | (#24311275)

I am of the same mind. On my E6400 Disk IO is my biggest bottleneck when running conversions. The 10k Drive boosted conversion speed from 3x to about 3.7x for a 700MB 1hr TV show. Not a great result, but not bad either.

What about playing? (1)

Godji (957148) | about 6 years ago | (#24311309)

Is there anything out there that can play a high-bitrate obese .mkv Blueray backup rip efficiently on 2 or 4 cores?

MPEG Algorithm (1)

c0d3r (156687) | about 6 years ago | (#24311381)

The mpeg algorithm is called DCT Cosine. If this is parallaizable, then mpeg encoding/decoding should be, although there is no way a general processor can beat an asic in silicon.

Of it's there, but hidden because is a hinderance (0)

Anonymous Coward | about 6 years ago | (#24311409)

Part of the reason you find a lack of SMP is because it actually negatively impacts the quality of the encode (Though not greatly). Alot of the time it's there, but hidden. The encoder looks at the frames around the current one being encoded and changes it's output based on what is found, to make things run smoother on future frames. When you start adding additional threads you have to somehow break up the file into sections , or have each thread do sequential frames. The result being the encoder can't use it's wizardry to it's full effect.

Windows? VirtualDub 1.8.x + ffdshow-tryouts (3, Informative)

tdelaney (458893) | about 6 years ago | (#24311445)

You don't say if you're running on Windows or Linux or something else. If you are running on Windows, the latest versions of VirtualDub have made big improvements to SMT/SMP encoding.

VirtualDub home []
VirtualDub 1.8.1 announcement []
VirtualDub downloads []

Make sure you grab 1.8.3 - 1.8.1 was pretty good, but had a few teething problems. 1.8.2 has a major regression which is fixed in 1.8.3. The comments in the 1.8.1 announcement contain a few important tips for using the new features (some of which I posted BTW).

The two major new features that would be of interest to you are:

1. You can run all VirtualDub processing in one thread, and the codec in another. This works very well in conjunction with a multi-threaded codec - this one change improved my CPU utilitisation from approx 75% to 95% on my dual-core machines - with an equivalent increase in encoding performance.

2. VD now has simple support for distributed encoding. You can use a shared queue across either multiple instances of VD on a single machine, or across multiple machines (must use UNC paths for multiple machines). Each instance of VD will pick the next job in the queue when it finishes its current job. Instances can be started in slave mode (in which case they will automatically start processing the queue).

I use 3 machines for encoding (all dual-core). With VD 1.8.x I start VD on two of the machines in slave mode, and one in master mode. I add jobs to the queue on the master instance, and the other two instances immediately pick up the new jobs and start encoding. When I've added all the jobs, I then start the master instance working on the job queue.

To achieve a similar effect on your quad-code, start two instances of VD on the same machine - one slave, the other master.

It's not perfect (if you've only got one job, you won't use your maximum capacity) but it has greatly simplified my transcoding tasks, and reduced the time to transcode large numbers of files.

Re:Windows? VirtualDub 1.8.x + ffdshow-tryouts (0)

Anonymous Coward | about 6 years ago | (#24312171)

Ugh.. VDub essentially uses vfw 16-bit technology and stores files in an avi container. It only supports more recent codecs (xvid/divx) through hacks in the avi format (e.g. storing frames out of order). H.264 is only partially supported (e.g. no b-frame pyramid), and hence vfw support for it was dropped in x264 r581. You shouldn't use VDub for anything other than lossless (intra-frame) encoding and move on to CLI + a sensible container (mp4/mkv) for your encoding work.

avidemux (5, Informative)

Unit3 (10444) | about 6 years ago | (#24311489)

I've noticed a lot of talk about commandline options, but not the nice guis that use them. Avidemux is open source, cross-platform, gives you a decent interface, and uses multithreaded libraries like ffmpeg and x264 on the backend to do the encoding, so it generally makes optimal use of your multicore system.

Do more jobs rather than one job more quickly (1, Informative)

myz24 (256948) | about 6 years ago | (#24311537)

As posted elsewhere, it is difficult to divide a project up that is really pretty linear. Instead, you should try to do more jobs at once. Encode four videos at once.

Also consider this. (2, Interesting)

SignOfZeta (907092) | about 6 years ago | (#24311919)

If you do a lot of H.264 conversion, look into picking up a hardware encoder. There's the Turbo.264; it's Mac-only, but I'm fairly sure it's a rebranded PC device. Plug into a USB port, and it speeds up H.264 encoding -- even on single-core systems. Imagine that with your quad-core. It's not a free solution, but if you find yourself doing a *lot* of encodes, it may be worth your money.

Re:Also consider this. (0)

Anonymous Coward | about 6 years ago | (#24312189)

One question, is the wikipedia article right and this device is mac only and 800x600 is the maximum resolution?, the alternatives look way better.

Lazy much? (0, Flamebait)

SleepyHappyDoc (813919) | about 6 years ago | (#24312203)

(And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)

Tell you what...why don't you FedEx your videos to a developer and ask him to do it for you? I'm sure they'd be happy to help a nice, polite, motivated person like yourself.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>