Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: How Reproducible Is Arithmetic In the Cloud?

timothy posted about 9 months ago | from the irreproducible-results dept.

Math 226

goodminton writes "I'm research the long-term consistency and reproducibility of math results in the cloud and have questions about floating point calculations. For example, say I create a virtual OS instance on a cloud provider (doesn't matter which one) and install Mathematica to run a precise calculation. Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time. In the cloud, hardware, firmware and hypervisors are invisible to the users but could still impact the implementation/operation of floating point math. Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same? What can be done to adjust for this? Currently, I know people who 'archive' hardware just for the purpose of ensuring reproducibility and I'm wondering how this tranlates to the world of cloud and virtualization across multiple hardware types."

cancel ×

226 comments

Sorry! There are no comments related to the filter you selected.

Fixed-point arithmetic (5, Informative)

mkremer (66885) | about 8 months ago | (#45486365)

Use Fixed-point arithmetic.
In Mathematica make sure to specify your precision.
Look at 'Arbitrary-Precision Numbers' and 'Machine-Precision Numbers' for more information on how Mathematica does this.

Re:Fixed-point arithmetic (-1)

Anonymous Coward | about 8 months ago | (#45486449)

Another solution is to make sure cloud providers are running on servers using this chip [wikipedia.org] . You may not get the answer you want, but it will be consistent ;-)

Re:Fixed-point arithmetic (5, Insightful)

Anonymous Coward | about 8 months ago | (#45486515)

Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"

Re:Fixed-point arithmetic (1)

Anonymous Coward | about 8 months ago | (#45486685)

I'm not so sure of that. Different apps address hardware and FPUs differently. If the app doesn't have a way to test its platform, then there are ways to check an instance of an OS to see what it does, in terms of mantissa, and basic vector math, record that as metadata, and then a future host could be compensated for. When the world went from 32bit to 64bit CPUs, lots changed. Intel has an ugly history with FPUs. Where precision is important, it's always nice to have done your own quick check.

Re:Fixed-point arithmetic (4, Informative)

Joce640k (829181) | about 8 months ago | (#45487205)

Submitter is entirely ignorant of floating point issues in general. Other than the buzzword "cloud" this is no different from any other clueless question about numerical issues in computing. "Help me, I don't know anything about the problem, but I just realized it exists!"

Wrong.

In IEEE floating point math, "(a+b)+c" might not be the same as "a+(b+c)".

The exact results of a calculation can depend on how a compiler optimized the code. Change the compiler and all bets are off. Different versions of the same software can produce different results.

If you want the exact same results across all compilers you need to write your own math routines which guarantee the order of evaluation of expressions.

OTOH, operating system, hardware, firmware and hypervisors shouldn't make any difference if they're running the same code. IEEE math *is* deterministic.

Re:Fixed-point arithmetic (5, Insightful)

immaterial (1520413) | about 8 months ago | (#45487471)

For a guy who started off a reply with an emphatic "Wrong" you sure do seem to agree with the guy you quoted.

Re:Fixed-point arithmetic (2, Insightful)

Anonymous Coward | about 8 months ago | (#45487483)

Urgh. Getting math to be deterministic is a major pain in the neck. Most folks are completely ignorant of the fact that sum/avg/stddev in just about all distributed databases will return a *slightly* different result everytime you run the code (and are almost always different between vendors/platforms, etc.).

Re:Fixed-point arithmetic (4, Insightful)

gl4ss (559668) | about 8 months ago | (#45487501)

the question was not about compilers or indeed about software, but about fpu's, about firing up the same instance, with the same compilers and indeed with the same original binary.

it sounds like just fishing for reasons to have a budget to keep old power hw around.

I would think that if the results change so much to matter depending on fpu, that the whole calculation method is suspect to begin with and exploits some feature/bug to get a tuned result(but assuming that the cpu/vm adheres to the standard that they would be the same - if the old one doesn't and the new one does then I think that a honest scientist would want to know that too).

Re:Fixed-point arithmetic (1, Interesting)

Anonymous Coward | about 8 months ago | (#45486529)

Or simply don't use the broken "cloud computing" model. If you have some calculations to do, and care the least about the results, how about buying a computer that does those calculations for you?

Re:Fixed-point arithmetic (4, Funny)

NEDHead (1651195) | about 8 months ago | (#45487213)

I have a mechanical calculator that is extremely reliable, so long as you oil it.

Re:Fixed-point arithmetic (4, Informative)

Giant Electronic Bra (1229876) | about 8 months ago | (#45486691)

Yes, you can do this, but its not feasible for all calculations. Things like trig functions are implemented on FP numbers, and once you start using FP its better to just keep using it, converting back and forth is just bad and defeats the whole purpose anyway. So in reality you end up with applications that DO use FP (believe me, as an old FORTH programmer I can attest to the benefits of scaled integer arithmetic!). Its one of those things, we're stuck with FP and once we assume that, then the whole question of small differences in results of machine-level instructions or of minor differences in libraries on different platforms, etc. you will probably find that arbitrary VMs won't produce exactly identical results when you run on different platforms (AWS, KVM, VMWare, some new thing).

Is it ia huge problem though? The results produced should be similar, the parameters being varied were never controlled for anyway. Its how often the rounding errors between two FPUs are identical. Neither the new nor the old results should be considered 'better' and they should generally be about the same if the result is robust. A climate sym for example run on two different systems for an ensemble of runs with similar inputs should produce statistically indistinguishable results. If they don't then you should know what the differences are by comparison. In reality I doubt very many experiments will be in doubt based on this.

Re:Fixed-point arithmetic (4, Insightful)

Jane Q. Public (1010737) | about 8 months ago | (#45486823)

"Is it ia huge problem though?"

If tools like Mathematica are dependent on the floating-point precision of a given processor, They're Doing It Wrong.

Re:Fixed-point arithmetic (5, Insightful)

Giant Electronic Bra (1229876) | about 8 months ago | (#45487357)

I think the problem is that people PERCEIVE it to be a problem. Nothing is any more problematic than it was before, good numerical simulations will be stable over some range of inputs. It shouldn't MATTER if you get slightly different results for one given input. If that's all you tested, well, you did it wrong indeed. Mathematica is fine, people need to A) understand scientific computing and B) understand how to run and interpret models. I think most scientists that are doing a lot of modelling these days DO know these things. Its the occasional users that get it wrong I suspect.

Re:Fixed-point arithmetic (2, Insightful)

Anonymous Coward | about 8 months ago | (#45486893)

protip: When discussing the difference between Fixed Point and Floating Point, the abbreviation "FP" is useless.

Re:Fixed-point arithmetic (0)

EdIII (1114411) | about 8 months ago | (#45486921)

Wow. You Smart.

Re:Fixed-point arithmetic (4, Interesting)

raddan (519638) | about 8 months ago | (#45487017)

Experiments can vary wildly with even small differences in floating-point precision. I recently had a bug in a machine learning algorithm that produced completely different results because I was off by one trillionth! I was being foolish, of course, because I hadn't use an epsilon for doing FP, but you get the idea.

But it turns out-- even if you're a good engineer and you are careful with your floating point numbers, the fact is: floating point is approximate computation. And for many kinds of mathematical problems, like dynamical systems, this approximation changes the result. One of the founders of chaos theory, Edward Lorenz [wikipedia.org] , of Lorenz attractor [wikipedia.org] fame, discovered the problem by truncating the precision of FP numbers from a printout when he was re-entering them into a simulation. The simulation behaved completely differently despite the difference in precision being in the thousands. That was a weather simulation. See where I'm going with this?

Re:Fixed-point arithmetic (5, Informative)

Giant Electronic Bra (1229876) | about 8 months ago | (#45487417)

Trust me, its a subject I've studied. The problem here is that your system is unstable, tiny differences in inputs generate huge differences in output. You cannot simply take one set of inputs that produces what you think is the 'right answer' from that system and ignore all the rest! You have to explore the ensemble behavior of many different sets of inputs, and the overall set of responses of the system is your output, not any one specific run with specific inputs that would produce a totally different result if one was off by a tiny bit.

Of course Lorenz realized this. Simple experiments with an LDE will show you this kind of result. You simply cannot treat these systems the way you would ones which exhibit function-like behavior (at least within some bounds). Lorenz of course also realized THAT, but sadly not everyone has got the memo yet! lol.

yeah, don't be lazy (2)

rewindustry (3401253) | about 8 months ago | (#45486741)

floats are soft option, only gets us all in trouble.

remember

we are pentium of borg, division is futile

bend reality (5, Funny)

goombah99 (560566) | about 8 months ago | (#45486383)

The result is always the same, but the definition of reality is changing. The result of every single calculation is in fact 42 in some units. The hard part is figuring out the units.

Re:bend reality (0)

sgbett (739519) | about 8 months ago | (#45486401)

froty-second post!!!!42

Re:bend reality (1)

weilawei (897823) | about 8 months ago | (#45486557)

This should be +5 Insightful, as it is, in fact, true.

Re:bend reality (2)

cheater512 (783349) | about 8 months ago | (#45486891)

Once you define the unit of truth that is. :P

Re:bend reality (1)

Anonymous Coward | about 8 months ago | (#45487107)

Otherwise it's between 41.999 and either 42.999, 42.499, 42.599, or 42.999 depending on how you round your fp number.

Re:bend reality (2)

yoink! (196362) | about 8 months ago | (#45487221)

We most certainly need Slashdot VirtualCrypto to gild comments like these. Karma alone is not enough and this comment is too damned funny.

WTF? (0)

Anonymous Coward | about 8 months ago | (#45486399)

Floating point and integer operations are well defined.
Unless someone fucks up with implementing the floating point unit the result should be exactly the same.

Re:WTF? (2)

wiredlogic (135348) | about 8 months ago | (#45486517)

They may be well defined but nobody implements fully standards compliant FP units and they have subtle differences in output. Even with identical hardware, configurable settings like rounding modes may also differ between instances.

Re:WTF? (3, Informative)

larry bagina (561269) | about 8 months ago | (#45486553)

Let's say you're using C on an x86. float (32-bit) and double (64-bit) are well defined. However, the x86 FPU internally uses long double (80-bit).

So if you do some math on a float or a double, the results can vary depending on if it was done as 80-bit or if the intermediaries were spilled and truncated back to 64/32 bit.

Re:WTF? (2)

gnasher719 (869701) | about 8 months ago | (#45487245)

So if you do some math on a float or a double, the results can vary depending on if it was done as 80-bit or if the intermediaries were spilled and truncated back to 64/32 bit.

Google for FP_CONTRACT. Quote from the C Standard:

A floating expression may be contracted, that is, evaluated as though it were a single operation, thereby omitting rounding errors implied by the source code and the expression evaluation method. The FP_CONTRACT pragma in provides a way to disallow contracted expressions. Otherwise, whether and how expressions are contracted is implementation-defined.

Re:WTF? (1)

elwinc (663074) | about 8 months ago | (#45486591)

Intel x87 scalar FP instructions use an 80 bit internal format for higher precision. Intel SSE2 vector FP instructions use 64 bits. You will see last bit variations depending on which instructions the compiler chooses. In fact, I've heard of cases where a JIT compiler vectorized a calculation sometimes (directing code to SSE2 hardware), and left it scalar other times (directing it to 80 bit x87 hardware). Might only make a difference in the last bit, but last bit variations can add up over a few billion calculations.

I can recall a physics simulation I was involved in years ago that got differences of 10% depending on what hardware we ran it on. Turned out the Sun &SGI workstations used 64 bit FP, while the IBM box used some 128 bit or something like that. Took a while to track that one down...

Re:WTF? (1)

Guy Harris (3803) | about 8 months ago | (#45487075)

Intel x87 scalar FP instructions use an 80 bit internal format for higher precision. Intel SSE2 vector FP instructions use 64 bits. You will see last bit variations depending on which instructions the compiler chooses.

And the compiler may choose differently depending on whether it's compiling for 32-bit or 64-bit x86 [github.com] .

Re:WTF? (0)

Anonymous Coward | about 8 months ago | (#45486689)

No.

The IEEE floating point standard specifies how to encode and calculate FPU operations, yes; but the problem is that FPU results change based on how many calculations are performed due to rounding errors. As a result, any compile-time optimization can add or remove roundings from your result depending on your optimizer settings. If that wasn't bad enough, consider also the fact that some processors provide fused multiply-add operations, which compute a multiply followed by an add in both less time and with less roundings than separate operations. On certain architectures FMA is mandatory for FPU operations and separate ops aren't provided.

Arbitray precision (-1)

Anonymous Coward | about 8 months ago | (#45486405)

Responsible programmers store every value as the ratio of two integers. The flags to extend any integer into a longer number of binary bits in built into every microprocessor. You can take 128-bit integers, and know when they overflow easily on modern x64 chips.

Of course, the negligent idiots who make a living programming don't know this.

Re:Arbitray precision (1)

norpy (1277318) | about 8 months ago | (#45486493)

Most of the time arbitrary precision is not necessary and it's easier (and faster) to just use a float. There are times when it matters, but for the most part people aren't doing things where it matters.

The submitter should know better about using integer operations for things that require precision though.

Re:Arbitray precision (3, Funny)

weilawei (897823) | about 8 months ago | (#45486579)

So I'm supposed to do all my calculations without any Pi? How can you have any Pi if you don't eat your machine?

Re:Arbitray precision (1)

fractoid (1076465) | about 8 months ago | (#45486743)

Responsible programmers store each value in the manner most suitable for that value. The reality is that very few applications actually care about the exact to-the-bit result of floating point ops, and floating point arithmetic should always be regarded as inexact.

Forget it (0)

Anonymous Coward | about 8 months ago | (#45486409)

Really, forget it. In 5 or 10 years cloud providers will not offer any compatibility to your current form of backup.

(And I really hope people miraculously realize that going to cloud does not solve any problems but creates more and more complex, and abandon the cloud in herds)

Re:Forget it (0)

Anonymous Coward | about 8 months ago | (#45486431)

Not to mention, nuclear simulations should be staying on LANL's hardware, not being foisted into the cloud.

Good luck (2, Insightful)

timeOday (582209) | about 8 months ago | (#45486435)

This problem is far broader than arithmetic. Any distributed system based on elements out of your control is bound to be somewhat unstable. For example, an app that uses google maps, or a utility to check your bank account. The tradeoff for having more capability than you could manage yourself, is that you don't get to manage it yourself.

Re:Good luck (0, Offtopic)

Anonymous Coward | about 8 months ago | (#45486497)

Any distributed system based on elements out of your control is bound to be somewhat unstable.

You've just explained many of the problems with government in one concise sentence.

Re:Good luck (0)

Anonymous Coward | about 8 months ago | (#45486947)

This problem is far broader than arithmetic. Any distributed system based on elements out of your control is bound to be somewhat unstable.

In this case, it's not out of your control. Floating point is by definition NOT precise. Multiple floating point operations can easily compound the error until the result falls below your needed level of precision, and when you don't control the hardware this can often happen without your realization. (The same is actually true when you DO control the hardware, just for the record).

The fact of the matter is that if you're running a highly critical application where you absolutely MUST have precision, you need to use fixed point math, not floating point. Especially if you're going to to doing sequences of operations.

Or put it another way, No True Physicist uses anything other than fractions.

I'm research the long-term consistency and ... (1)

oDDmON oUT (231200) | about 8 months ago | (#45486455)

First sentence seems stilted at best.

Easiest solution (3, Funny)

ShaunC (203807) | about 8 months ago | (#45486465)

Just scroll down a couple of posts [slashdot.org] . "Quite soon the Wolfram Language is going to start showing up in lots of places, notably on the web and in the cloud."

Problem solved!

Numerical instability (5, Insightful)

Anonymous Coward | about 8 months ago | (#45486469)

If the value your computing is so dependent of the details of float point implementation that you'er worried about it, you probably have an issue of numerical stability and the results you are computing are likely useless, so this is really a mute point.

Re:Numerical instability (1)

Anonymous Coward | about 8 months ago | (#45486629)

this is really a mute point.

Pitty your knot.

Re:Numerical instability (0)

Anonymous Coward | about 8 months ago | (#45486681)

This.

If the difference between results on different machines is big enough to matter, then fix the code.
Else, you're fine (by definition the difference is small enough to not matter).

Nowhere did we need to "adjust" for anything.

Re:Numerical instability (3, Funny)

brantondaveperson (1023687) | about 8 months ago | (#45487327)

This is the only answer so far that makes sense, which is a pity because

A) It's an AC
and
B) The point is moot, not mute.

But we all knew that, didn't we.

Use infinite precision software packages (4, Informative)

shutdown -p now (807394) | about 8 months ago | (#45486479)

What the title says - e.g. bignum for Python etc. It will be significantly slower, but the result is going to be stable at least for a given library version, and that is far easier to archive.

one small problem (0)

Anonymous Coward | about 8 months ago | (#45486667)

$ make
ERROR on line 7: Unable to reduce M_PI to a rational number; ran out of hard disk space trying to save the result.

I'm sorta kidding, but I'm also pointing out a serious flaw in this proposal.

If you really want it done right, use interval arithmetic and iterate each calculation until the error is within acceptable tolerance. This can also require insane amounts of storage space, but at least it will allow you to stop after a finite number of digits of e or pi based on what your computation requires in order to give an accurate answer.

Re:one small problem (1)

shutdown -p now (807394) | about 8 months ago | (#45486739)

Well, the original question was about hardware floating point arithmetic, which has the same problem.

Re:one small problem (0)

Anonymous Coward | about 8 months ago | (#45487055)

If you really want it done right, use interval arithmetic and iterate each calculation until the error is within acceptable tolerance. This can also require insane amounts of storage space, but at least it will allow you to stop after a finite number of digits of e or pi based on what your computation requires in order to give an accurate answer.

PI is a ratio. Just like 1/3, neither can be precisely represented using decimals. You can prevent filling up your disk if you cut the umbilical cord and stop using imprecise numerical representations. Yes, it is completely possible, but only if you understand how math works.

Pi is not a ratio. It is irrational. (1)

Anonymous Coward | about 8 months ago | (#45487381)

It is not like 1/3.

You need to go back to math class.

Re:one small problem (1)

istartedi (132515) | about 8 months ago | (#45487453)

PI is irrational, 1/3rd isn't. 1/3 could be represented perfectly if the implementation had a "repeating" bit. AFAIK, there isn't any commonly used FP hardware that has such a bit, so yeah; 1/3 is not perfectly represented.

This reminds me of the arguments you get from people when you try to explain that 0.9 repeating is exactly equal to 1.0.

Their minds really get blown when you explain that 0.9 repeating is just 0.3 repeating + 0.3 repeating + 0.3 repeating. All those 3s add up to 9, all the way out into infinity. It's the same as 3*(1/3), so plainly it equals 1.0; but their minds still have a hard time dealing with 0.9 repeating equaling 1.0.

A more succinct way to get over it? Repeating decimals are just alternative representations of numbers. The symbol known as 0.9 repeating just happens to map to the same number as 1.

Re:one small problem (0)

Anonymous Coward | about 8 months ago | (#45487591)

and pi could be represented perfectly if it had a PI bit.

Re:one small problem (1)

petermgreen (876956) | about 8 months ago | (#45487623)

PI is irrational, 1/3rd isn't. 1/3 could be represented perfectly if the implementation had a "repeating" bit. AFAIK,

You'd need more than one extra bit to represent reccuring binary fractions because you need to store the point at which the pattern repeats. And you would still only be able to store a subset of rational numbers exactly because you would still have a limited number of bits.

Your chances are pretty darned good (5, Informative)

Red Jesus (962106) | about 8 months ago | (#45486489)

Mathematica in particular uses adaptive precision; if you ask it to compute some quantity to fifty decimal places, it will do so.

In general, if you want bit-for-bit reproducible calculations to arbitrary precision, the MPFR [mpfr.org] library may be right for you. It computes correctly-rounded special functions to arbitrary accuracy. If you write a program that calls MPFR routines, then even if your own approximations are not correctly-rounded, they will at least be reproducible.

If you want to do your calculations to machine precision, you can probably rely on C to behave reproducibly if you do two things: use a compiler flag like -mpc64 on GCC to force the elementary floating point operations (addition, subtraction, multiplication, division, and square root) to behave predictably, and use a correctly-rounded floating point library like crlibm [ens-lyon.fr] (Sun also released a version of this at one point) to make the transcendental functions behave predictably.

Re:Your chances are pretty darned good (0)

Anonymous Coward | about 8 months ago | (#45486627)

I'm not a Reddit user, but this is an example of the perfect use case for their interface: your answer should be on the top of the page. Thread closed. ;)

iEEE 754 (3, Insightful)

Jah-Wren Ryel (80510) | about 8 months ago | (#45486503)

Different results on different hardware was a major problem up until CPU designers started to implement the IEEE754 standard for floating point arithmetic. [wikipedia.org] IEEE754 conforming implementations should all return identical results for identical calculations

However, x86 systems have an 80-bit extended precision format and if the software uses 80-bit floats on x86 hardware and then you run the same code on an architecture that does not support the x86 80-bit format (say, ARM or Sparc or PowerPC) then you are likely to get different answers.

I think newer revisions of IEEE754 have support for extended precision formats up to 16-bytes, but you need to know your hardware (and how your software uses it) to make sure that you are doing equal work on systems with equal capabilities. You may have to sacrifice precision for portability.

Re:iEEE 754 (0)

Anonymous Coward | about 8 months ago | (#45487255)

IEEE754 doesn't define a type with 80-bits of precision. The IEEE754 type on x86 hardware is usually a double, 64-bits wide with 53-bits of precision (i.e. binary64 under IEEE754). Using Intel's 80-bit extended precision float is precisely what you don't want to do if you want to remain portable--across systems and across time.

differnt isnt always wrong (0)

Anonymous Coward | about 8 months ago | (#45486513)

another important question is what makes you think the numbers you are getting now
are "correct" or just what the computer is telling you they are...
if they are "mostly" correct who cares if the definition of mostly changes in the future.

You need to know some numerical analysis (5, Insightful)

daniel_mcl (77919) | about 8 months ago | (#45486521)

If your calculations are processor-dependent, that's a bad sign for your code. If your results really depend on things that can be altered by the specific floating-point implementation, you need to write code that's robust to changes in the way floating-point arithmetic is done, generally by tracking the uncertainty associated with each number in your calculation. (Obviously you don't need real-time performance since you're using cloud computing in the first place.) I'm not an expert on Mathematica, but it probably has such things built in if you go through the documentation, since Mathematica notebooks are supposed to exhibit reproduceable behavior on different machines. (Which is not to say that no matter what you write it's automatically going to be reproduceable.

Archiving hardware to get consistent results is mainly used when there are legal issues and some lawyer can jump in and say, "A-ha! This bit here is different, and therefore there's some kind of fraud going on!"

Re:You need to know some numerical analysis (0)

Anonymous Coward | about 8 months ago | (#45486785)

Maybe he's doing a chaotic simulation. For example, if you simulate galaxies colliding, then the uncertainty of individual star positions increases exponentially. You don't care about bounding it, you just want to simulate a possible timeline. Now if you notice something interesting in one particular simulation and you'd like to run it again to zoom on it, you really need reproducible arithmetics. Keeping the uncertainty in check by running with a billion digits of precision would take too long.

Re:You need to know some numerical analysis (2)

brantondaveperson (1023687) | about 8 months ago | (#45487341)

notice something interesting in one particular simulation and you'd like to run it again to zoom on it,

If the thing you're zooming in on is dependant of the behaviour of floating point numbers, then it's not interesting from any point of view other than that. It certainly won't represent anything physically meaningful, which since we're talking about galaxy simulations I assume is the point.

Re:You need to know some numerical analysis (5, Insightful)

rockmuelle (575982) | about 8 months ago | (#45486803)

This.

Reproducibility (what we strive for in science) is not the same as repeatability (what the poster is actually trying to achieve). Results that are not robust on different platforms aren't really scientific results.

I wish more scientists understood this.

-Chris

Re:You need to know some numerical analysis (4, Interesting)

Red Jesus (962106) | about 8 months ago | (#45487183)

While that's true in many cases, there are some situations in which we need . Read Shewchuk's excellent paper [berkeley.edu] on the subject.

When disaster strikes and a real RAM-correct algorithm implemented in floating-point arithmetic fails to produce a meaningful result, it is often because the algorithm has performed tests whose results are mutually contradictory.

The easiest way to think about it is with a made-up problem about sorting. Let's say that you have a list of mathematical expressions like sin(pi*e^2), sqrt(14*pi*ln(8)), tan(10/13), etc and you want to sort them, but some numbers in the list are so close to each other that they might compare differently on different computers that round differently, (e.g. one computer says that sin(-10) is greater than ln(100)-ln(58) and the other says it's less).

Imagine now that this list has billions of elements and you're trying to sort the items using some sort of distributed algorithm. For the sorting to work properly, you *need* to be sure that a < b implies that b > a. There are situations (often in computational geometry) where it's OK if you get the wrong answer for borderline cases (e.g. it doesn't matter whether you can tell whether sin(-10) is bigger than ln(100)-ln(58) because they're close enough for graphics purposes) as long as you get the wrong answer consistently, so the next algorithm out (sorting in my example, or triangulation in Shewchuk's) doesn't get stuck in infinite loops.

Unfortunately... (0)

Anonymous Coward | about 8 months ago | (#45487393)

Actual mileage varies greatly. On x86 hardware C long double is 12 bytes, but on other hardware it's 16. If you're running an iterative process you are almost certainly not going to converge to the same result. Arbitrary precision arithmetic will mitigate the problem, but there are lots of opportunities to get in trouble. This is why regression tests are so important for mathematical codes.

You really need an analytic case you can check against. That can be really hard to come up with in many circumstances. It is hard to write code that will produce the same results with different revisions of a compiler on a single processor. It gets a lot harder when the processor changes. Numerical analysts worry a lot about such things. There have been papers which showed wildly divergent results for well written code on different processors. Compiler optimization can do things that have very unintuitive results.

Library (2)

blueg3 (192743) | about 8 months ago | (#45486545)

It depends on what you mean by "cloud", which is sort of a catchall term. As you've pointed out, on SaaS clouds you're going to have no guarantee of consistency, even if no time passes -- you don't know that the cloud environment is homogeneous. For (P/I)aaS clouds, you can hopefully hold constant what software is running. For example, if you have your Ubuntu 12.04 VM that runs your software, when you fire up that VM five years from now, its software hasn't changed one bit. You of course have to worry about whether or not the form you have the VM in is even usable in five years. You would hope that, even with inevitable hardware changes, if none of the software stack changes, then you'll get the same results. I'd guess that if they're running all on hardware that really correctly implements IEEE floating-point numbers, than you will in fact get consistent results. But I wouldn't bet on it.

What you really need, unfortunately, is a library that abstracts away and quantifies the uncertainty induced by hardware limitations. There are a variety of options for these, since they're popular in scientific computing, but the overall point is that using such techniques, you can get consistent results within the stated accuracy of the library.

Depends... (1)

whiplashx (837931) | about 8 months ago | (#45486551)

If you are depending on serious precision, floating point was not the way to go in the first place. Floating point implementations are not guaranteed to be exactly the same, nor exactly correct.

Re:Depends... (1)

cdrudge (68377) | about 8 months ago | (#45486985)

Floating point implementations are not guaranteed to be exactly the same, nor exactly correct.

If only there was some type of standard adopted that would make it so this wasn't the case...

You may have bigger issues (1)

guruevi (827432) | about 8 months ago | (#45486563)

If you're worried about your program generating different results on different arch, you have some serious coding issues.

The math should be the same on all systems. If you're worried, try 2 different systems against a known or manually calculated result, that's how the Pentium-type bugs were discovered (if you remember).

Typically major issues in your processing units will be discovered quickly because of the ubiquity in the market. Unless you're using a custom built or compromised chip on eg primes, you shouldn't worry and even if it were compromised (the Chinese ARM chips or NSA-controlled crypto accelerators) you'll still get a valid result, just less secure.

Solved problem (1)

jrumney (197329) | about 8 months ago | (#45486571)

The problem of inconsistent floating point calculations between machines has been solved since 1985 [wikipedia.org] . I'm sure moving your app into the cloud doesn't suddenly undo 28 years of computing history.

Re:Solved problem (1)

gnasher719 (869701) | about 8 months ago | (#45487209)

The problem of inconsistent floating point calculations between machines has been solved since 1985. I'm sure moving your app into the cloud doesn't suddenly undo 28 years of computing history.

Except it hasn't. On a PowerPC or Haswell processor, a simple calculation like a*b + c*d can legally give three different results because of the use of fused multiply-add. In the 90's to early 2000's, you would get different results because of inconsistent use of extended precision.

Re:Solved problem (2)

Red Jesus (962106) | about 8 months ago | (#45487219)

Not quite. IEEE754 mandates correct rounding for addition, subtraction, mutiplication, division, and square roots, but it only requires "faithful rounding" for the transcendental functions like sin, cos, and exp. That means that, for example, that even the floating point number nearest arcsin(1) is above it (i.e. correct rounding in this case requires that you round up), a math library that rounds arcsin(1) *down* is still compliant. The only requirement is that it round ot one of the two nearest floating point numbers.

Re:Solved problem (0)

Anonymous Coward | about 8 months ago | (#45487359)

That doesn't actually solve the problem. Calculation order among other things still matters, and compilers are at liberty to reorder instructions.

Frist 3D pirnter prost (1, Offtopic)

Hognoxious (631665) | about 8 months ago | (#45486577)

The solution is to use a 3D printer to make your own cloud.

Rounding error (1)

msobkow (48369) | about 8 months ago | (#45486581)

If you're not allowing for rounding errors, your result is invalid in the first place.

If you don't want rounding errors, use a packaged based on variable precision mathematics, like a BCD package.

Why should the results change? (0)

Anonymous Coward | about 8 months ago | (#45486595)

Assuming there are no bugs like the Pentium fdiv bug, then there is only one way to simulate a floating point instruction correctly.
An x87 register has 80 bits and everyone simulating those with 64 bit doubles because he wants to use SSE instructions does it wrong.

I hope mathematica sets the rounding mode before performing calculations.

Re:Why should the results change? (0)

Anonymous Coward | about 8 months ago | (#45487375)

An x87 register is not the only way of performing floating point calculations, nor is it the best way.

Google Docs (1)

chrisgagne (605844) | about 8 months ago | (#45486673)

Seeing as I get floating point math artifacts for simple arithmetic operations (e.g., balancing a household budget) in Google Doc spreadsheets...

Ye Old Text (3, Insightful)

Anonymous Coward | about 8 months ago | (#45486709)

This has pretty much been the bible for many, many, many years now: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

If you haven't read it, you should - no matter if you're a scientific developer, a games developer, or someone who writes javascript for web apps - it's just something everyone should have read.

Forgive the Oracle-ness (it was originally written by Sun). Covers what you need to know about IEEE754 from a mathematical and programming perspective.

Long story short, determinism across multiple (strictly IEE754 compliant) architectures while possible is hard - and likely not worth it. But if you're doing scientific computing, perhaps it may be worth it to you. (Just be prepared for a messy ride of maintaining LSB error accumulators, and giving up typically 1-3 more LSB of precision for the determinism - and not only having to worry about the math of your algorithms, but the math of tracking IEEE754 floating point error for every calculation you do).

What you can do, easily, however is understand the amount of error in your calculations and state the calculation error with your findings.

Re:Ye Old Text (0)

Anonymous Coward | about 8 months ago | (#45486735)

I should probably amend to this, that deterministic IEEE754 calculations obviously have an inherent performance cost with them anyway (extra operations/registers/etc tracking/checking error as you go) - and as such if this 'is' an option for you, you're likely not in a performance critical domain - in which case, if it wasn't already obvious, any precision you trade off for determinism can of course be bought back at an even greater cost w/ 64/128/265/etc bit floating point emulation.

double/quad math libraries are easy to come by (eg: http://gcc.gnu.org/onlinedocs/libquadmath/) - any higher and you typically need to roll your own (the math isn't that hard to follow, and implementing it yourself typically takes less than a day w/ unit tests).

Is it just a language barrier? (3, Informative)

s.petry (762400) | about 8 months ago | (#45486745)

My first thought on seeing "tranlate" and "I'm research" was that it's only language, but then I read invalid and incorrect statements about how precision is defined in Mathematica. So now I'm not quite sure it's just language.

Archiving a whole virtual machine as opposed to the code being compiled and run is baffling to me.

Now if you are trying to archive the machine to run your old version of Mathematica and see if you get the same result, you may want to check your license agreement with Wolfram first. Second, you should be able to export the code and run the same code on new versions.

I'm really really confused on why you would want this to begin with though. Precision has increased quite a bit with the advent of 64bit hardware. I'd be more interested in taking some theoretical code and changing "double" to "uberlong" and see if I get the same results than what I solved today on today's hardware.

Unless this is some type of Government work which requires you to maintain the whole system, I simply fail to see any benefit.

Having "Cloud" does not change how precision works in Math languages.

Re:Is it just a language barrier? (2)

multimediavt (965608) | about 8 months ago | (#45487269)

His whole question and narrative is telling. This is obviously someone that has no idea what he is doing nor why. He is also most likely in violation of Wolfram's license agreement [wolfram.com] on top of his lack of computational knowledge. He should have stopped at web statistics [slashdot.org] and stayed there.

Re:Is it just a language barrier? (0)

Anonymous Coward | about 8 months ago | (#45487415)

Burn The Cloud-Head-Revolved Witch!

(jeez, the man was just askin..)

First, identify the problem (1)

putaro (235078) | about 8 months ago | (#45486775)

You ask:

Say I archive the virutal instance and in 5 or 10 years I fire it up on another cloud provider and run the same calculation. What's the likelihood that the results would be the same?

If the calculation is 2 + 2 I'd says the odds are pretty good you're going to get 4. I assume you're actually doing some difficult calculations that may push some of the edge cases in the floating point system. What I would do is make some test routines that stress the areas that you're interested in and run and check the results of those before doing any serious calculations. For the most part, you're going to have to assume that the basic functions work and there aren't simply specific combos like 17454423.2 + 99921234.1 that always gives the wrong answer since you can't check for those really but the usual concern is around the edge case handling and you should be able to define what you think is normal and make sure that your environment conforms to your definition of normal.

Try it on a single PC first (1)

TheloniousToady (3343045) | about 8 months ago | (#45486825)

I currently have a Matlab script that produces slightly different FIR filter design coefficients each time I run it - when run on the same version of Matlab on the same machine. And this is with Matlab, whose primary selling point is its industrial-strength mathematical "correctness".

Also, I once used a C compiler that wouldn't produce consistent builds, and not just by a timestamp. The compiler vendor said that a random factor was used to decide between optimization choices that scored equally. We finally had to ask the vendor to remove that "feature" so we could reproduce a build, which was required as a condition for software release.

So, good luck reproducing math results in the cloud, and over many years.

Write a test suite (0)

Anonymous Coward | about 8 months ago | (#45486995)

Write a test suite that verifies all the behaviour you expect a system to provide to your code. Then, 10 years from now on whatever system you have to use, run those tests and make sure they pass.

Simulate IEEE754-compliant FPU? (1)

Dputiger (561114) | about 8 months ago | (#45487035)

Can't Mathematica be told to stick to an 80-bit precision output? If you can specify that in software, it shouldn't matter what code the underlying platform runs on.

Fuzzy Logic (1)

Tablizer (95088) | about 8 months ago | (#45487059)

cloud + jello * cotton / fog = fuzz

Re:Fuzzy Logic (0)

Anonymous Coward | about 8 months ago | (#45487137)

No, cloud + jello * cotton / fog = fluff because you used non-utf characters

What if it's not reproducible? (1)

gnasher719 (869701) | about 8 months ago | (#45487135)

Floating-point arithmetic will produce rounded results. The rounded result of a single operation will depend on the exact hardware, compiler etc. that is used. x86 compilers many years ago sometimes used extended precision instead of double precision, giving slightly different results (usually more precises). PowerPC processors and nowadays Haswell processors have fused multiply-add, which can give slightly different results (usually more precise). So the same code with the same inputs could give slightly different results.

The IEEE floating-point standard requires double precision with a 53 bit mantissa. They might have required a 54 or 52 bit mantissa, which would give slightly different rounding errors.

Now my point: If your code performing all these operations produces almost the same results on different implementations, then it is quite likely that your code is right. If you get vastly different results, then your code is likely wrong or the problem is very hard.

Some developers think that getting identical answers means that the answers are good. That's not true at all. If you have small differences due to slightly different rounding then there is a good chance that your results are good. Identical results guarentee nothing.

Associated concern (1)

cold fjord (826450) | about 8 months ago | (#45487171)

If you haven't already you may want to have a look at Interval arithmetic [wikipedia.org] since it addresses some associated issues. It is supported in various development environments and libraries.

This is just toooo technical (1)

NEDHead (1651195) | about 8 months ago | (#45487229)

I still have trouble with 1+1=10

IEEE 754-2008 (1)

TheSync (5291) | about 8 months ago | (#45487253)

If the math has been calculated with IEEE 754-2008 [wikipedia.org] , it is IEEE 754-2008 (aka ISO/IEC/IEEE 60559:2011). Should not matter what you are running it on...

Consistency of results (0)

Anonymous Coward | about 8 months ago | (#45487347)

To obtain consistency of results, you'll have to not rely on the hardware numerics at all. Use a software implementation of floating point (MPFR is a good one but there are others) that is included in your application. It is unlikely that the basic integer operations will change underneath you (but it is still possible). If you can't accept the hit for a full software implementation, rely on IEEE 754 arithmetic and code your algorithms very very carefully. Numerical analysis is difficult and full of traps for the naïve or unwary. Again, use a good library where possible.

Well written numeric code will be tolerant of slight deviations in results. e.g. the x86 uses 80 bit intermediaries while most other platforms use 64 and this causes subtle differences and a well designed algorithm will tolerate either, a poorly designed one will fail on one or both.

Hardware Arb Precision Decimal Processors (1)

the eric conspiracy (20178) | about 8 months ago | (#45487387)

I could see one thing happening over time. Right now a lot of software does calculations involving decimal fractions in floating point. The problem with this is that in general you cannot precisely represent a decimal fraction using a binary floating point number. This is why you often see results like a-b = 0.19999999999999.

Well I think it is possible that we could see development of hardware arithmetic units that would internally use arbitrary precision fixed point calculations to do these sorts of calculations to eliminate these sorts of errors. So when you run your current programs on these processors the improved representation of decimal fractions would lead to slightly different results.

Hamming's Motto (1)

dido (9125) | about 8 months ago | (#45487497)

You would do well to remember a quotation attributed to Richard W. Hamming: "The purpose of computing is insight, not numbers."

Perfect reproduction is difficult / undesireable (1)

ljhiller (40044) | about 8 months ago | (#45487551)

This came up before with Java, which, in its original incarnation, demanded exact reproduction of floating point results...with horrible horrible results. Generally, when people perform floating point calculations, they want AN answer, not THE answer, because they know there isn't a unique exact answer.

This issue was described far better than I can in William Kahan's essay, How Java's floating point hurts everyone everywhere [berkeley.edu]

Integer, floating and interval arithmetic (1)

Mr Z (6791) | about 8 months ago | (#45487681)

I remember a quote, attributed (likely incorrectly) to Seymour Cray: "Do you want it fast, or do you want it accurate?"

If you want absolutely exact arithmetic, code it entirely with arbitrary precision exact integer arithmetic. All rational real numbers can be expressed in terms of integers, and you can directly control the precision of approximation for irrational real numbers. Indeed, if your rational numbers get unwieldy, you can even control how they are approximated. And complex numbers, of course, are just pairs of real numbers in practice. (Especially if you stick to rectangular representations.) If you stick to exact, arbitrary precision integer arithmetic and representations derived from that arithmetic that you control, then you can build a bit-exact, reproducible mathematics environment. This is because integer arithmetic is exact, and you have full control of the representation built on top of that. Such an environment is very expensive, and not necessarily helpful. You can even relax the order of operations, if you can defer losses of precision. (For example, you can add a series of values in any order in integer arithmetic as long as you defer any truncation of the representation until after the summation.)

If you venture into floating point, IEEE-754 gives you a lot of guarantees. But, you need to specify the precision of each operation, the exact order of operations, and the rounding modes applied to each operation. And you need to check the compliance of the implementation, such as whether subnormals flush to zero (a subtle and easy to overlook non-conformance). Floating point arithmetic rounds at every step, due to its exponent + mantissa representation. So, order of operations matters. Vectorization and algebraic simplification both change the results of floating point computations. (Vectorization is less likely to if you can prove that all the computations are independent. Algebraic simplification, however, can really change the results of a series of adds and subtracts. It's less likely to largely affect a series of multiplies, although it can affect that too.)

And behind curtain number three is interval arithmetic. [wikipedia.org] That one is especially interesting, because it keeps track at every step what the range of outcomes might be, based on the intervals associated with the inputs. For most calculations, this will just result in relatively accurate error bars. For calculations with sensitive dependence on initial conditions (ie. so-called "chaotic" computations), you stand a chance of discovering fairly early in the computation that the results are unstable.

False assumption (4, Informative)

bertok (226922) | about 8 months ago | (#45487699)

This assumption by the OP:

Mathematica generates the result based on the combination of software version, operating system, hypervisor, firmware and hardware that are running at that time.

... is entirely wrong. One of the defining features of Mathematica is symbolic expression rewriting and arbitrary-precision computation to avoid all of those specific issues. For example, the expression:

N[Sin[1], 50]

Will always evaluate to exactly:

0.84147098480789650665250232163029899962256306079837

And, as expected, evaluating to 51 digits yields:

0.841470984807896506652502321630298999622563060798371

Notice how the last digit in the first case remains unchanged, as expected.

This is explained at length in the documentation, and also in numerous Wolfram blog articles that go on about the details of the algorithms used to achieve this on a range of processors and operating systems. The (rare) exceptions are marked as such in the help and usually have (slower) arbitrary-precision or symbolic variants. For research purposes, Mathematica comes with an entire bag of tools that can be used to implement numerical algorithms to any precision reliably.

Conclusion: The author of the post didn't even bother to flip through the manual, despite having strict requirements spanning decades. He does however have the spare time to post on Slashdot and waste everybody else's time.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>