×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Revisiting Amdahl's Law

Soulskill posted about 10 months ago | from the looking-for-new-loopholes dept.

Programming 54

An anonymous reader writes "A German computer scientist is taking a fresh look at the 46-year old Amdahl's law, which took a first look at limitations in parallel computing with respect to serial computing. The fresh look considers software development models as a way to overcome parallel computing limitations. 'DEEP keeps the code parts of a simulation that can only be parallelized up to a concurrency of p = L on a Cluster Computer equipped with fast general purpose processors. The highly parallelizable parts of the simulation are run on a massively parallel Booster-system with a concurrency of p = H, H >> L. The booster is equipped with many-core Xeon Phi processors and connected by a 3D-torus network of sub-microsecond latency based on EXTOLL technology. The DEEP system software allows to dynamically distribute the tasks to the most appropriate parts of the hardware in order to achieve highest computational efficiency.' Amdahl's law has been revisited many times, most notably by John Gustafson."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

54 comments

Buzzword-heavy (4, Insightful)

Animats (122034) | about 10 months ago | (#44046925)

The article makes little sense. The site of the DEEP project [deep-project.eu] is more useful. It has the look of an EU publicly funded boondoggle. Those have a long history; see Plan Calcul [wikipedia.org] , the 1966 plan to create a major European computing industry. That didn't do too well.

The trouble with supercomputers is that only governments buy them. When they do, they tend not to use them very effectively. The US has pork programs like the Alabama Supercomputer Center [asc.edu] . One of their main activities is providing the censorware for Alabama schools. [asc.edu]

There's something to be said for trying to come up with better ways of making sequential computation more parallel. But the track record of failures is discouraging. The game industry beat their head against the wall for five years trying to get the Cell processors in the PS3 to do useful work. Sony has given up; the PS4 is an ordinary shared-memory multiprocessor. So are all the XBox machines.

It's encouraging to see how much useful work people are getting out of GPUs, though.

Re:Buzzword-heavy (0)

Anonymous Coward | about 10 months ago | (#44046975)

Thank you for not forcing me to read that retarded string of characters that slashdot apparently like to call "news".

Re:Buzzword-heavy (3, Interesting)

cold fjord (826450) | about 10 months ago | (#44047029)

The article makes sense, but I don't think the work appears to be especially innovative even if it could be very useful.

It is more than governments that buy supercomputers. They are also used in industry for things like oil and gas exploration, economic modeling, and weather forecasts. Universities and research organizations also use them for a variety of purposes. Time on an actual supercomputer tends to be highly valuable and sought after. You may disagree with the use, but that is a different question from not being used effectively.

The Secret Lives of Supercomputers, Part 1 [technewsworld.com]

"It is probably the biggest trend in supercomputers -- the movement away from ivory-tower research and government-sponsored research to commerce and business," Michael Corrado, an IBM spokesperson, told TechNewsWorld. In 1997, there were 161 supersystems deployed in business and industry, but that figure grew to 287 by June 2008, he noted. "More than half the list reside in commercial enterprises. That's a huge shift, and it's been under way for years."

Uses for supercomputers [zdnet.com]

Re:Buzzword-heavy (-1)

Anonymous Coward | about 10 months ago | (#44047361)

GPU's are not multiprocessors ... they are CPU's witch can do very fast operations on images... 1 at the time.
But they still are not multiprocessors.
THIS is why it's easy to get performance out of them.

Re:Buzzword-heavy (3, Interesting)

rioki (1328185) | about 10 months ago | (#44047749)

You might want to read / view these slides:An Introduction to Modern GPU Architecture [nvidia.com] Especially slide 42.

Modern GPUs are massively parallel in their execution. Yes they work "only" on one image, but when rendering one scene the sharers work in parallel. For example a fragment (aka per pixel) shader will be run in parallel for each pixel, limited by the number of available shader units (aka core). THIS is why you get the awesome performance: small, self contained programs running in parallel.

Re:Buzzword-heavy (2)

Bengie (1121981) | about 10 months ago | (#44062259)

GPU cores are broken into groups. Each group must be doing the exact same instruction at the exact same time. Branches are horrible for performance as it will force some cores to stop computing all together while waiting for the branch to finish.

There are many concurrent algorithms that don't like to keep the execution path in perfect sync. This is where a many-core CPU will take out a GPU in performance. GPUs also have horrible random access and very small caches. Actually, the per core cache of GPUs has been going down over the years for both nVidia and AMD.

GPUs are excellent for what they're good at, and horrible for everything else. If I could remember the link, there was a 100GFlop Intel CPU kicking the crap out of a 2Tflop nVidia GPU on transcoding, even though both code-paths were highly optimized for each architecture. It just so happened that the algorithm used did not play well with GPUs.

Re:Buzzword-heavy (0)

Anonymous Coward | about 10 months ago | (#44047567)

As someone who actually gets payed by one of those EU publicly funded boondoggles I'd have to say they do quite well (at paying me).

I'd even go so far as to say that there are some genuinely interesting results that come out of these projects.
There is just a lot of additional work that goes into massaging your results into something that fits the EU's mission statements and goals.

And of course the overall goals of developing some platform for EU industry or whatever is total BS.
Research projects like this will never accomplish something like that.
That mission is basically to develop solutions to problems that no one has.

Posting anonymous for obvious reasons.

Re:Buzzword-heavy (2)

smallfries (601545) | about 10 months ago | (#44047855)

How dare you criticise the author - he is a physicist and he has stooped to coming and telling us computer science types how to do it properly!

There is a deeply appropriate xkcd but I cannot be bothered to find it. Decoding the garbage in the pcworld story tell us that he is going to break Amdahl's Law by dynamically partitioning the workload between a fast single threaded processor and many slower parallel processors. I would guess that my failing to make a fair comparison they can claim that the portion running under the boosted clock somehow beats the bounds predicted by Amdahl's law. Sadly it does not as the law is worded in the proportion of the code that can be executed on the parallel architecture.

It is quite possible that much of the hyperbole was added as sales pitch, which is a little unfortunate as the dynamic partitioning and the toolchain support are far more interesting anyway.

Re:Buzzword-heavy (1)

mysidia (191772) | about 10 months ago | (#44048129)

claim that the portion running under the boosted clock somehow beats the bounds predicted by Amdahl's law.

Right... their system cannot 'break' Amdahl's law. They bypass it by allowing the sequential portion of the workload to run on faster hardware, and the parallel portion of the workload to run on the massively parallel (but slower) architecture.

Designing an approach that allows better parallel computing despite Amdahl's law, does not imply necessarily breaking the law.

It's more like: working cleverly within the constraints of Amdahl's law, to evade the issue.

For it to work, they have to have that super-fast sequential platform though.

Unless there continue to be linear speed performance improvements in the fast sequential execution architecture (and not just more cores on the commodity PCs), they will still run into Amdahl's law.

Re:Buzzword-heavy (1)

postbigbang (761081) | about 10 months ago | (#44049923)

Go back further to Von Neumann and you'll see that this is a hybrid model, where the state machine is respected, with mgmt processes acting as controler daemons to child processes. It's not really a bypass, just a hybrid representation as the distributed portions still respect Amdahl's precepts.

Re:Buzzword-heavy (1)

smallfries (601545) | about 10 months ago | (#44058731)

Your phrasing is kind of hard to parse - I actually can't tell if you are agreeing with what I wrote, or arguing in a passive-aggressive way. This implies that I have had too many arguments with passive aggressive people recently and I need to learn to read things more neutrally again. But yes, that is what I was pointing out: tweaking the frequency in the fast sequential part is still covered by Amdahl's law, contrary to their wild hyperbole.

Re:Buzzword-heavy (2)

rgbatduke (1231380) | about 10 months ago | (#44048601)

Hey, don't disrespect physicists in parallel computing. Some of us actually understand how to do it properly and agree with what you state. Superlinear speedup is not precisely unknown, but it is rare and depends on architectural "tricks" that typically preserve Amdahl's law at a low level but apparently violate it at a higher level. In the naivest, stupidest example, if we didn't count cores instead of processors, even embarrassingly parallel code would exhibit superlinear speedup on a single processor system. Replace core count with internal ALUs, pipelines, SIMD/MIMD in the architecture, onboard vector units, etc, and one can get the same sort of thing per core for just the right code.

I am deeply skeptical of any sort of toolset that purports to be able to either statically or dynamically partition a given set of upper level code to get superlinear speedup. I won't say it is impossible to build a set that "works" for some fraction of the parallelizable code in the Universe, but given the complex tradeoffs between computation and communication in different communication topologies and task partitionings, this is not a problem that has a simple universal solution and I suspect that in lots of cases an experienced parallel programming human being could spend a half day analyzing the code and architecture and beat (or tie, as the USUAL rule is going to be no meaningful superlinear speedup boring for coarse grained parallel or embarrassingly parallel code) the output from an automated tool.

An interesting example of a tool that does this sort of tuning (semi-empirically!) that works is ATLAS, the automatically tuned linear algebra system. Basically it does a search of the space of partitionings and algorithms to determine the best combination of the two for performing basic linear algebra functions (BLAS) and then implements it in a transparent library. It is semi-empirical because it is nearly impossible to predict the overall effect of every combination of SSE support, clock speed, bus speed, core architecture -- it is a lot easier to just go and find out. But the problem ATLAS solves is comparatively simple relative to even static task partitioning on heterogeneous computational resources with variable costs for core-to-core communication, especially in today's multicore world where one has different speeds between cores on a processor, between processors in a system, between systems, between general purpose processors and special purpose e.g. GPU/vector processors, where the communication topology itself can have a major impact on the kind of parallel speedup any given task has.

This, then, is the interesting part as you note, and who knows, maybe they've built a sufficiently intelligent system to get nominally superlinear speedup (or hell, who cares, just getting close to optimal speedup sublinear or not) from a meaningful fraction of the space of possible parallizable code. But God couldn't get superlinear speedup on fine grained synchronous parallel code with long range coupling out of any multi-node scalable parallel architecture available in the real world today, no matter how fancy the partitioning tool.

rgb

Re:Buzzword-heavy (0)

Anonymous Coward | about 10 months ago | (#44052551)

How dare you criticise the author - he is a physicist and he has stooped to coming and telling us computer science types how to do it properly!

He's a computational physicist, moron. Did you even read the guy's credentials?

Re:Buzzword-heavy (1)

RabidReindeer (2625839) | about 10 months ago | (#44047945)

The trouble with supercomputers is that only governments buy them.

Actually, not so. For about 15 minutes, I once owned a supercomputer myself, believe it or not.

It wasn't a major supercomputer, but it was classified as a true supercomputer and I was acting as an intermediary for an oil industry company who had offended the seller, so the seller wouldn't sell directly to them.

Governments are definitely big consumers of supercomputers, but universities also do a lot of computationally-intensive work, not all of which is necessarily government-funded. I've already mentioned oil companies. I'm sure there are other cases.

Re:Buzzword-heavy (2)

rgbatduke (1231380) | about 10 months ago | (#44048801)

Double ditto. I've written magazine articles on beowulf-style supercomputers I've built at home (I used to write a column for "Cluster World" magazine in the brief time that it existed, but I also wrote an article or two for more mainstream computer mags). I have also set up clusters for companies I've founded and helped others set up corporate clusters. Some of what are arguably the world's largest parallel supercomputers -- Google's cluster, for example -- are not government funded. Many others aren't government funded, they are built by companies that sell products to many entities, among them (perhaps) the government. Aerospace engineering companies all need supercomputers to do computational fluid dynamics on hull designs, for example. Ordinary engineering companies use them to do finite element analysis. Gaming clusters are by any sensible definition a highly parallelized, dynamically partitioning supercomputer.

Ever since the invention of PVM and open source versions of MPI, anybody with a small pile of computational resources and a network has been able to implement a beowulf-style supercomputer built from them, an architecture so successful that nearly all of the supercomputers built in the world today are basically "beowulfs". I've helped a few dozen individuals (one at a time, not via my book or magazine articles) build beowulf clusters at home just to dink around with for fun, or to learn a new job skill, or to set up a learning cluster at a small community college or university. No government funding, often out of pocket funding or repurposing old computers that are lying around. Not all of these clusters could beat Moore's Law, which has inexorably eaten Amdahl's Lunch after a few years (that is, by the time they were built it was often the case that a single processor over the counter computer at the high end of clock and so on would beat the small cluster made of older systems) but there is no doubt that they were supercomputer architecture with substantial (but sublinear) speedup compared to single threaded execution times for a suitable parallelized chore.

Besides, it is useful to remember that your cell phone would have been considered a munition a bit over a decade ago. A better thing to state is that everybody buys supercomputers because almost every processor based system from navigation systems in cars to cell phones to tablets to personal computers is, these days, a supercomputer. My i7 laptop has four cores, eight contexts, and exhibits linear speedup on in-cache embarrassingly parallel code out to eight simultaneous tasks because Intel has done a pretty amazing job of internally parallelizing the execution subsystems for the contexts. It beats the hell out of almost all of the small clusters I've ever built, including clusters with many, many more nodes. Build even a small stack of i7 systems on a gigabit or better network -- two, for example -- and you've got a sixteen core supercomputer with a complex communication topology (variable speeds and nonlinear thresholds, as the i7 does stop giving you the purely linear 8 way speedup for large enough tasks and drops down to a bit over four -- again an instance of "superlinear speedup" of parallel code even WITHOUT using a fancy tool if you simply count cores instead of context and ignore internal parallelism for certain kinds of code that permits a single core to be managing memory I/O for one task while executing the other).

rgb

Re:Buzzword-heavy (1)

grizdog (1224414) | about 10 months ago | (#44048319)

I agree. The article is next to worthless. In particular, it appears (and that is the problem - the article is just too vague) that they are not counting the GPU time against Amdahl's law. That's splitting hairs, at best.

There might be some "there there" if they tried to refine Amdahl's law to include different kinds of processors, and the kinds of physical restrictions they talk about. All the article does is say such a thing might be possible - I think we already knew that.

Re:Buzzword-heavy (1)

delt0r (999393) | about 10 months ago | (#44048327)

I use supercomputers all the time for my work. I am at university so this perhaps is a government one. But we are not idiots and use it quite effectively thank you very much. In most of the supercomputers in the EU at least are for universities. They are mostly used quite well. At least all the ones i have used. Which is quite a few of them.

Re:Buzzword-heavy (0)

Anonymous Coward | about 10 months ago | (#44050397)

Please don't use Alabama as an example of how effective government is, or what governments use supercomputers for. Not all states are Alabama--this is why I don't live there. Some states use their supercomputers for other things, like biochemistry, aerodynamics, climate modeling, or you name it. Things that private industry would never invest in because they're too selfish to see the benefit.

I'm tired about bashing the government. Sure, there's pork, but there's way more pork in private industry. And services like NOAA are critical to everyone.

For every Alabama censoring project, there's private firms using supercomputers to cheat people out of money in the financial industry, or to make a buck by destroying ecosystems.

To be clear, I don't hate private industry. I'm just tiring of the unfair comparisons that get made, like somehow private industry is a lily-white utopia without any problems of its own. Both have their advantages and disadvantages.

FWIW, I agree with the general sentiment that this project seems like sort of a hand-waving exercise, which is common in research everywhere.

Xeon dream on (-1)

Anonymous Coward | about 10 months ago | (#44047033)

Xeon Phi = unavailable vaporware, doled out in single unit quantities in order to discourage folks from porting big science applications to CUDA.

This neverending vaporware is certainly not news for nerds or stuff that matters.

Re:Xeon dream on (3, Informative)

godrik (1287354) | about 10 months ago | (#44047167)

"Xeon Phi = unavailable vaporware"

You know, I wrote a paper on SpMV for Xeon Phi and I got quite a lot of people from all over the world asking me for clarification and for code. So it seems to be quite widespread. You can actually buy some online, Google points to several vendors.

"in order to discourage folks from porting big science applications to CUDA"

There are two things wrong with this statement. First of all, I do not think scientist are discourage from giving a shot to CUDA. Just check any scientific conference and you'll see GPU and CUDA everywhere. Actually we see so much GPU programming that it is getting boring.
Also porting to CUDA is difficult and alien for most people. If we can get similar performance using programming model people are used to, how is that not a good thing? What is so good about CUDA? It is just pretty much the only way to get good performance out of NVIDIA gpus.

The tradeoff between performance, hardware cost and developper cost is a difficult tradeoff. I say let's throw them all in the arena and see what stands.

Disclaimer: my research is supported by both Intel and NVIDIA.

Re:Xeon dream on (2)

ImprovOmega (744717) | about 10 months ago | (#44051555)

Optimizing CUDA is almost, but not quite, as arcane as optimizing assembly code by hand. It requires a deep knowledge of the underlying architecture. The addressing, the memory read patterns, and the role of each of the tiers of memory and the cost of moving between tiers, the size restrictions on each buffer, and how to coalesce the whole mess into a coherent answer. I once got a 30% performance increase by offsetting the addressing on my memory buffers so that they didn't all start on 16-byte boundaries. It allowed the data to be read in parallel and avoided collisions from the different processes trying to access the same block at the same time. The problem is most programmers aren't particularly hardware oriented, so CUDA comes with a steep learning curve if you want to do it well.

Re:Xeon dream on (0)

Anonymous Coward | about 10 months ago | (#44054635)

"You can actually buy some online, Google points to several vendors"

Provide a link. The only thing that turns up are sucker traps that don't actually have phi for sale.

Re:Xeon dream on (0)

Anonymous Coward | about 10 months ago | (#44049481)

Actually, you can buy Xeon Phis. We have a pair in one of our machines. Also, I don't see why using a proprietary NVidia system is any better than using a proprietary Intel system. If you care about interoperability, you should be using OpenCL.

SMBC (2)

klapaucjusz (1167407) | about 10 months ago | (#44047095)

SMBC [smbc-comics.com]

Re:SMBC (1)

godrik (1287354) | about 10 months ago | (#44047185)

Yeah, there is nothing wrong with amdahl's law. People that need to care about it clearly understand what it means. That is to say, when you increase parallelism, sequential parts become bottlenecks. You need to reengineer the problem/algorithm/architecture around that new bottleneck.

Re:SMBC (1)

Anonymous Coward | about 10 months ago | (#44047465)

"Sequential parts" usually mean "We won't know how to proceed further on until previous step is done". However, if you have really massive, 2^data_word_length or higher scale parallelism, then you can actually try guessing, and executing next step an all possible outcomes of previous step, then throwing away every result but one as previous step completes. Even if your parallelism is of lower scale, statistically it may still yield some speedup, whenever you happen to have a lucky guess. Sure beats letting all those processors just idling on helplessly.

Re:SMBC (2)

mysidia (191772) | about 10 months ago | (#44048155)

then you can actually try guessing, and executing next step an all possible outcomes of previous step, then throwing away every result but one as previous step completes.

However... this requires power consumption, and it still does take time and tie up your infrastructure working on the 'guess'. Meanwhile, the previous step completes, and your CPUs are all still busy working on guessing the previous step, and you need additional sequential overhead to initiate and terminate the guessing process.

You show 'CPU usage' on your 'idle cores', but it's 99% waste heat.

Re:SMBC (1)

TheLink (130905) | about 10 months ago | (#44050965)

I've long wondered if you can set up a quantum computer to process "all possible paths" and then "collapses" on the most probable right answer.

After all you can have light beams (and other quantum state friendly stuff) that are superpositions of all possible states and perform functions on them.

Poor summary (5, Informative)

Anonymous Coward | about 10 months ago | (#44047315)

Amdahl's Law still stands. TFA is about changing the assumptions that Amdahl's Law is based on; instead of homogenous parallel processing, you stick a few big grunty processors in for the serial components of your task, and a huge pile of basic processors for the embaressingly parallel components. You're still limited by the fastest processing of non-parellel tasks, but by using a heterogenous mix of processors you're not wasting CPU time (and thus power and money) leaving processors idle.

Repeat after me: (4, Insightful)

Mashdar (876825) | about 10 months ago | (#44048667)

Ahmdal's Law only applies to individual algorithms. Ahmdal's Law only applies to individual algorithms. Ahmdal's Law only applies to individual algorithms.

Besides which, Ahmdal's law is an obvious truth unless you can make a process take negative time. All attempts to make Ahmdal's Law sound fancy or complicated are a disservice. All attempts to pigeonhole Ahmdal's Law into only applying to parallel design are a disservice. Any attempts to "revisit" are either fallacious or focus on algorithm changes, which Amdahl made no attempt to address.

Ahmdal's law in a nutshell: If you spend 10% of your time on X and 90% of your time on Y, you will never get more than a 1/.9 speedup by optimizing X, even if you manage to make X instantaneous. Another way to put it is that if Y takes 9 seconds, you are never going to get the process under 9 seconds by modifying X...

Re:Poor summary (1)

Impy the Impiuos Imp (442658) | about 10 months ago | (#44048669)

Most of the cool stuff is pure parallel anyway, like the brain, or simulations of bodies made of atoms or cells. Plenty of room to grow regardless of some un-de-serializable algorithms.

Re:Poor summary (0)

Anonymous Coward | about 10 months ago | (#44055889)

So most of the cool stuff is pure nothing, right?

Re:Poor summary (0)

Anonymous Coward | about 10 months ago | (#44055929)

The smallest unit within any task is sequential. You're never "pure parallel".

please don't censor (-1)

Anonymous Coward | about 10 months ago | (#44047419)

Nuke nuke, submarine, allah, osama, obama, AES, encryption, DES, cracking, hacking crypto cryptanalysis hijack international diplomatic relations afghanistan iraq oil resources gas mining EVISCERATOR FASCINATOR HAVEQUICK SUITE A SUITE B MEDLEY, SHILLELAGH, BATON, SAVILLE, WALBURN, JOSEKI-1, LOFLAC, DRS, DNFQBT, ELLIPTIC CURVE P=NP bilderberg illuminati tavistock

Now that I've got your attention, NSA analyst who is reading this flagged message:

Do you think your illegal and invalid behavior will endure against universal karmic justice? Resign. Simply quit and get out with your soul still intact. YOUR choice to do what is right will cost you nothing and gain you everything.

All others who agree with this message need to repost. You have been asking what YOU can do? THIS. :yeah:

Re:please don't censor (-1, Offtopic)

umghhh (965931) | about 10 months ago | (#44047753)

Normally I get just pissed off and do something else when reading /. for too long - thanx to this AC post I laughed for a while. Give him some points!

Re:please don't censor (-1)

Anonymous Coward | about 10 months ago | (#44048087)

Repost wouldn't work, they filter those out. Just add a few random nefarious, but natural sounding, sentances at the beginning of each post.

Not again (0)

Anonymous Coward | about 10 months ago | (#44047611)

Every time I hear someone starting about how Amdahl's law is wrong it means one of two things:
1. They want your attention and their topic isn't interesting enough without resorting to controversial statements.
2. They don't understand Amdahls law.

Also, unless you're presenting a summary of the history of computing, you really shouldn't have a figure of Moore's law.
Some people handle this well. When they get to that point in their presentation they just say:
And this is the mandatory picture of Moore's law.
And skip to next slide.

Shi's Law, revisited (1)

G3ckoG33k (647276) | about 10 months ago | (#44047661)

In 2006 I submitted this (http://slashdot.org/comments.pl?sid=183461&cid=15153431):

"Researchers in the parallel processing community have been using Amdahl's Law and Gustafson's Law to obtain estimated speedups as measures of parallel program potential. In 1967, Amdahl's Law was used as an argument against massively parallel processing. Since 1988 Gustafson's Law has been used to justify massively parallel processing (MPP). Interestingly, a careful analysis reveals that these two laws are in fact identical. The well publicized arguments were resulted from misunderstandings of the nature of both laws.

This paper establishes the mathematical equivalence between Amdahl's Law and Gustafson's Law. We also focus on an often neglected prerequisite to applying the Amdahl's Law: the serial and parallel programs must compute the same total number of steps for the same input. There is a class of commonly used algorithms for which this prerequisite is hard to satisfy. For these algorithms, the law can be abused. A simple rule is provided to identify these algorithms.

We conclude that the use of the "serial percentage" concept in parallel performance evaluation is misleading. It has caused nearly three decades of confusion in the parallel processing community. This confusion disappears when processing times are used in the formulations. Therefore, we suggest that time-based formulations would be the most appropriate for parallel performance evaluation."

Maybe it will be helpful gain

Re:Shi's Law, revisited (0)

Anonymous Coward | about 10 months ago | (#44052351)

Thank you for this excerpt, the paper seems to be down, though!
A working link is: http://spartan.cis.temple.edu/shi/public_html/docs/amdahl/amdahl.html
(seems to be the star wars guy got replaced by a hellenophile, as the server was called yoda before)

Not breaking Amdahls law (2)

sjames (1099) | about 10 months ago | (#44048687)

This most certainly does NOT break Amdahl's law. It simply partitions the problem to use the cheap gear for the embarrassingly parallel portion of the workload and the expensive gear for the harder to parallelize workload.

It necessarily cannot make a non-parallelizable portion (the serial part) run in parallel.

Note that what part of the problem is serial depends on the hardware. The lower the latency and the higher the bandwidth of the interconnect, the more of the problem you can get to run effectively in parallel. However, there comes a point where the problem cannot be decomposed further. The atoms that remain after that may all be run at once, but the individual atom will run serially. No matter what you do, 5*(2+3) can go no faster than serially adding and then multiplying (yes, you could do two multiplications in parallel and then add, but you gain nothing for it).

Re:Not breaking Amdahls law (0)

Anonymous Coward | about 10 months ago | (#44050051)

5*(2+3) decomposes to 4 * (2 + 3) + (2 + 3), which can be boosted by using the hardware address resolver (which is a 3-way native adder with bitshift).

Re:Not breaking Amdahls law (0)

Anonymous Coward | about 10 months ago | (#44050231)

Amdahls law is about the bandwidth and the time to compute something.

For 5*(2+3) you have a fixed output of time
Using your special instruction 4*(2+3) + (2+3) is changing where data goes (thus changing the time to send/recv). You can use amdahls law to compare each one and see if you have decreased the time to send/recv the results.

You can also decompose it into 2+3+2+3+2+3+2+3+2+3. Which is wildly parallel. But the cost to send it out and get the data back may be more expensive than just doing the multiply. You have to try it on your hardware to get an idea what it would do. As decomposing it this way makes it highly dependent on your interconnects.

What Amdahl was getting at was, do not ignore your interconnects when working out time to do something. Which *many* of our CS big O notations ignore.

It's revisited every day (0)

Anonymous Coward | about 10 months ago | (#44048755)

Amdahl came up with this law, to voice his scepticism about parallell computing. In favor of better uni-processors.

The law is revisited every day by anyone who needs to calculate the cost of software licenses.

Nowadays the cost of HW is miniscule, compared to the cost of for example off the shelf oracle databases that are based on how many cores a system has.

Smart money would hoard dual-core Xeon processors and oracle licenses while they are still available for these processors, as it can mean saving $100k's of dollars in licensing cost over the next few years.

There's only ONE law (0)

Anonymous Coward | about 10 months ago | (#44049155)

Brannigan's Law.

Most misunderstood 'law' (-1)

Anonymous Coward | about 10 months ago | (#44051345)

Cretins love to quote Ahmdal's Law as the reason multi-core computing solutions are a waste of time. According to these cretins, no building site should ever employ more than 4 workers, because the overheads of using more people out-weighs the extra work they can do.

Ahmdal was a prize idiot working on the moronic idea of automatically converting ordinary single-thread algorithms into a parallel form that could execute faster on multiple computing cores. This was the kind of garbage promoted in the early days of so called super-computers. It allowed certain kinds of PhD employees to appear useful to the DoD.

What Ahmdal was actually excited about was DEPENDENCIES in co-dependent computing threads. Ahmdal came to the conclusion, already known to every toddler in the real world, that no task can useful be split into many multiple parts if those parts need to keep referencing each other while they are being executed.

Let me make this simpler for those of you thick enough to think that knowing things like Ahmdal's Law actually represents a 'skill' in computer science. Today the Xbox360 and PS3 are around 8+ years old. However, the games they run today are infinitely more sophisticated than when the hardware first appeared. Why? Because each platform actually had multiple computing units, both on the CPU and GPU side. As game programmers learnt to use each of these units to their maximum processing capability, games became more sophisticated.

So how did they 'avoid Ahmdal's Law? Well, as I said at the top, Ahmdal's Law is utter tosh in most circumstances clods think it applies in. Modern games use what is known as 'work unit' programming- a method EXACTLY analogous to the building site example I gave at the top. Essentially you break the process of producing each game frame down into as many self-contained computing units as possible. These units are designed to have limited and obvious dependencies with one another, just like different builders working on that site. These units are assigned to hardware computing cores as they become available.

The new consoles, the Xbox One and PS4, have EIGHT CPU cores and many hundreds of GPU processing elements. According to the dribblers that endless quote Ahmdal's Law, this is a complete waste of time. After all, Ahmdal's Law states that scaling after around 3.5 computing units makes extra computing units pointless. Now, this IS entirely true with non-core-aware computing on the Windows PC (which is the reason Intel has been so reluctant to go beyond 4 cores).

The Windows PC is actually a great place to explore Ahmdal's Law, but in a different way. Windows makes no attempt to run a single thread on multiple cores at the same time, as clods like Ahmdal tried to do. However, even so, the law still applies when your Windows session creates ever larger pools of threads from multiple applications that Windows distributes across all available x86 cores. Two x86 cores just about double one. The third core acts like 0.5 extra cores. The fourth core adds really very little. Cores beyond this ar a waste of time for the most part (AMD's major problem with its 8-core bulldozer/piledriver parts).

Of course, core aware software CAN, if it has the right kinds of algorithms, scale perfectly to more than 4 cores. Such software is vanishing rare, though.

Anyway, please stop quoting Ahmdal's Law- it really never had anything useful to say even when it was first coined, and today is used to purely troll forums where people are developing for modern multi-core computing systems.

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...