Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and learn more about it. Thanks for reading, and for making the site better!

geo writes "Beyond3D.com's Ryszard Sommefeldt dons his seersucker hunting jacket and meerschaum pipe to take on his secret identity as graphics code sleuth extraordinaire. In today's thrilling installment, the origins of one of the more famous snippets of graphics code in recent years is under the microscope — Quake3's Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees while contemplating its simple beauty and power." From the article: ""

Interesting smiley... is that a dead man with a fraction in his mouth and a prominent Adam's Apple, wearing a bow tie and a dress and standing on a toy race car?

The name of the function explains it pretty well: it is the inverse square root function.

Just to make sure, I did RTFA and it says it in the first paragraph or so.

Re:A famous quote (1, Informative)

Anonymous Coward | more than 7 years ago | (#17070514)

As I understand it, it finds the inverse square root, which is important to find the square root of a number.

To find the square root of say, N, you use InvSqrt(N) and then multiply the result by N to get the square root of N. Supposedly, it's faster than using the built-in sqrt() function.

It's just John Carmack's way of telling everyone he knows mathematics and please worship the ground he walks on. Won't be long before this code pops up in one of his spaceships and then NASA will be bowing down at the launch pad.

Right because when he denied credit in that article it totally showed that he's a pompous asshole.
Next time read the article before you spew bullshit out of your cum receptacle ok?

If you RTFA, the first person the author asks about it is J.C., and he says "nope, not me, maybe this other guy". Specifically NOT taking credit for this snip of code.

Something to do with lighting I'd imagine (unless they've also modeled gravitational attraction)

Re: A better question: (2, Interesting)

Anonymous Coward | more than 7 years ago | (#17070926)

Here's a more interesting question: "How does this function compute 1/x^(1/2)?" I'm asking, because I have no idea how it works.;)

Re:A famous quote (0)

Anonymous Coward | more than 7 years ago | (#17071192)

Geek motherfucker! Do you speak it?

Evil math--it approximates the inverse square root (2, Informative)

Anonymous Coward | more than 7 years ago | (#17071314)

They're computing the inverse square root, usually written as something like x^(-2) (which is the same as 1/sqrt(x) of course) via, umm, err, magic:-)

Seriously, they're using some evil numerical techniques that approximate the actual value of that function to withing a few percent very quickly. You could do more steps of Newton's root finding method or other complex things, but that would be slow and this is only meant to find things that are "good enough" for you to draw on someone's screen.

This function is used in graphics programming, specifically in things like the Quake 3 source code (as well as other places). First, it calculates x/2, then it gets the bits of the float x as an integer (without modifying them), then it does the "magic" step with the magic number they calculated via evil mathematics to give the best starting point they can for Newton's method using the x >> 1 to divide the float-as-integer in two (and make it positive, since your sign bit falls into the exponent of the float!... that's one of the crazy parts the PDF analyzes). Then it changes the int-float back into a plain float and does one step of Newton's method using the guess derived from the magic part.

Don't mind me. I may have a degree in mathematics, and I even (somehow!) made it through at least one class on numerical analysis, but all I can tell you is that this stuff is evil:-) Read the PDF linked in the article if you want someone who still knows what they're talking about to explain it; I'm just trying my best to remember anything after all these years. I swear that analysis class gave me some Pavlovian instinct to shudder in dread whenever I see them pulling out the strange substitutions and Taylor expansions so they can calculate the error...

Okay - At times you want a normalised vector. A lot of the time you will have vectors of arbitrary length For example, the light is at the origin, the point is at (12, 4, 3). So the vector from the point to the light is (-13, -4, -3). The length of this vector can be calculated easily using pythagoras's theorem (sqrt (12^2+4^2+3^2). It's 13 units in length. We want a unit vector (i.e. a vector 1 unit in length) So we divide by the length to get (12/13, 4/13, 3/13).

This is great for a 3D rendering application, but in a game speed is critical. This pair of calculations involves a square root and a divide. Both of thse are at least an order of magnitude slower than multiplications and additions.

So what this function does is provide a value you can multiply each component by to get a unit vector.

Well, there's the what and why parts. As for the , I have no idea. I think it uses magic.

Don't they teach Numerical Methods in Computer Science curricula anymore? That's a required course at Ohio State--for undergrads.

Of course, most students have a hard time with that class, and can't actually remember how to do anything after the course other than Monte-Carlo Integration. But you should at LEAST remember being exposed to Newtonian integration in your undergrad numerical methods class!

The only thing amazing about this code is that someone paid good-enough attention in class to be able to apply what he learned to his job.

The real genius here is Isaac Newton. 'course, that's not news to anyone.

From the words of John Carmack himself, if you discover and implement an algorithm by yourself, even though it may have already been discovered already, you deserve credit for finding it on your own.

" Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees "

I was a little worried when Slashdot posted the Britney Spears beaver pictures, but they now have their credibility back as the home of "News for Nerds".

Re:And so why do we care? (5, Funny)

Anonymous Coward | more than 7 years ago | (#17070682)

Just because I would have no idea why this was funny except for taking Logic courses as a philosophy major:
Fermant's last theorem. [wikipedia.org]

Re:It's Funny. Laugh. (1, Informative)

Anonymous Coward | more than 7 years ago | (#17070962)

Most/.ers are familiar with Fermat's last theorem. If nothing else, there was a lot of media hype when Andrew Wiles announced his solution. Beyond that, it's been in a couple episodes of Star Trek. Explaining it here is like explaining the existence of the aurora to eskimos.

Because he's trying to get at the binary representation of the IEEE754 floating point number, which he then manipulates a bit, and puts back where it was.

With your suggested code, you'd be operating on the actual number, casting the float to an int, which would just be throwing away a bunch of information.

(int) x would convert the floating point value to an integer (truncation, basically).
*(int*) &x treats the bits as an integer, with no behind the scenes conversion to an actual int value.

Because in C's own weird way, that's the only way of refering to a float as an int without changing the bits.

If you do this: int i = (int)3.0f;

You get i=3, like what you'd get from the floor() function.

If you do this: float f = 3.0f; int i = *(int*)

Then i contains a bit-for-bit copy of the IEEE floating-point representation of 3.0.

It's because C knows how to cast a float to an int by applying the floor function. However, if you do it the second way, you aren't casting a float to an int, you are casting a pointer-to-float to a pointer-to-int and then dereferencing it.

By the way, I just wanted to say... this is one of the most interesting things I've read on Slashdot in a while. Wow. That function is just amazing. I only wish I understood how it worked. I know nothing about what a "Newton-Raphson iteration" is.

Newton-Raphson is a general algorithm for finding root of an equation f(x)=0.

You start with some INITIAL GUESS (the real beauty of this algorithm) X(0), then apply:

X(n+1) = X(n) - f(X(n)) / f'(X(n))

where
X(n+1) is the NEXT guess after the value you 'know',
X(n) is that most recent value you know,
f(X(n)) is the function evaluated at X(n) and
f'(X(n)) is the first derivative of f(x) evaluated at X(n).

It's not foolproof and a BOTH whether it converges at al AND how FAST it converges depends on the initial guess, X(0)

The "Secant Method" is an improvement that makes it a little 'smarter,' at the expense of more computation (this is often a positive trade-off on numerical modeling codes, since the 'smarter' algorithm does tend to converge faster). There are other improvements as well, such as the Los Alamos Linear Feedback Solver (a slightly modified secant method that converges about 10-17% faster, at least for some types of problems) that I use in my own codes.

[Forgive the LaTeX notation -- I'm not sure how else to best express the math without sub and sup tags.]

Newton-Raphson is a root finding method. Given a starting point it finds successively more accurate approximations to the root. The formula for it looks like: x_{n+1} = x_n - f(x_n)/f'(x_n), where f(x) is the function to find the root of, f'(x_n) is the derivative of that function and x_n are the successive approximations. Essentially, given a guess it looks at the slope of the function at that point follows a tangent line until it crosses the zero line. That point becomes the next guess.

Finding the reciprocal square root of a is equivalent to solving x^{-2} - a = 0 which means that you use f(x) = x^{-2} - a. The derivative of that is f'(x) = -2x^{-3}. Putting those into the Newton-Raphson formula gives you x_{n+1} = x_n - (x_n^{-2} - a) / -2x_n^{-3}, which simplifies down to x_{n+1} = x_n(3 - a*x_n^2)/2. This is equivalent to the last line before the return in that InvSqrt() function.

Convergence of Newton-Raphson is quadratic which means that each time you repeat that line with a previous guess you double the number of digits of accuracy (up to the limit of the hardware of course).

it looks redundant, that's for sure. i cannot see, unless it's easier for the processor this way to use the address of the variable rather than pass it by value perhaps?

Because it's not actually a float-to-int cast as you'd normally think of it. Rather, it's mucking with the bit representation of the float. It's roughly equivalent to "union { float x; int i; };"

I'll take a swing at this one. It's because the author doesn't want the value of x, but the integer representation of the value at x's memory address.

If x is 3.14159, (2) will result in i==3, whereas (1) will result in whatever the 4-byte IEEE-754 representation of 3.14159 is (0x40490FD0, if Google is correct). By using (1), the author is able to use integer bitwise opeartions (>>) to perform "free" floating point operations. When i is sent back into floating point form via:

x = *(float*)

x now contains the value of the integer operation:

i = 0x5f3759df - (i >> 1);

which was presumably faster than an identical floating point operation. It's a nifty little solution, especially with regard to the selection of the magic number.

One is 'take this value and convert it to an integer' (your version).

One is 'take this set of bytes, assume it's integer data, and tell me what value you get when you dereference (their version).

They won't necessarily render the same result.

I haven't looked at the code, but depending on the type of X, it could provide some wildly different results.

Re:What's with use of Pointers? (1, Insightful)

Anonymous Coward | more than 7 years ago | (#17070762)

> int i = (int)x;

converts a float to an int, potentially cutting off the digits past the decimal point and/or leading digits that don't fit into the 32 bit integer range.

> int i = *(int*);

reinterpretes the bit pattern of the float number as an integer, so the exponent and the sign bit become part of the number.

A straight int cast would round off the float into an int. The intention of the code here is to manipulate the binary representation of the float instead.

I think it's a quick hack to halve the exponent and linearly extrapolate a new value for the mantissa.

C "understands" ints and floats. If you do the simple cast:

int i = (int)x;

Then C will simply convert the float value into an integer value (throwing away fractional part). But this isn't what we want. We want to operate on the bits of an IEEE floating point value directly, and integers are the best way to do that.

So first, we lie to the compiler by telling it we have a pointer to an int:

(int *) &f

And then we deference the pointer to get it into an operable int:

i = *(int *) &f

Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

Moreover, when compiled, the optimizer does the right thing - since access to the variable is actually a memory dereference (or a register access, but we'll ignore it for now), *(int*)& means "tell the compiler this is an int, but don't do anything in the code", whereas (int) means "create a new temporary that's the int part of the variable".

Try them out. You'll see that they give different numbers. "int i = (int) x" converts x into an integer with approximately the same value. "int i = *(int*)&x" gives you an integer with the same binary representation as the float (and which will give different results depending on endianness and size of int).

OK, my subject is flippant, but it is also serious. The alternative you are proposing simply extracts an integer approximation to the float x. The code as written does something different: it extracts the binary representation of the floating-point number. That is, it is extracting the raw bits involved.

Re:What's with use of Pointers? (0)

Anonymous Coward | more than 7 years ago | (#17070918)

The (int)i version does data conversion from float to integer. The detour through the address prevents that, because while there is a conversion from float to int, there is none from float* to int*. So *(int*)&x delivers the binary IEEE754 float representation of x. (int)i delivers the rounded integer value of x.

Since x is a float, int i = (int)x; would find the (best) integer approximation of x and store it in i. What the author meant to do was take the binary representation of x and have the compiler treat it as the binary representation of an int -- thus the "taking the address", followed by a cast to int *, followed by a dereference to store the actual value in i.

If this gets you confused, I'd definitely suggest trying out both methods in your favorite compiler:)

Re:What's with use of Pointers? (0)

Anonymous Coward | more than 7 years ago | (#17071028)

He is doing floating point computations in an integer. He is interested in the values in the bytes, so simply casting the float to an int would not work.

The reason is that they want to use the bit values of the floating point number in the integer unit of the processor, not get the integer representation of the number.

Other people have answered this, so I'll ask another question.

Isn't it a little clearer, and potentially faster (at least if your optimiser doesn't understand) to use union { float f; int i; }value; value.f = x; i = value.i;

Bad programmer, no cookie (-1, Troll)

Anonymous Coward | more than 7 years ago | (#17070700)

Writing code like this nowadays, in most situations, is a waste of development time. It is probably a premature optimization that will slow computations down. The approximations it involve are likely to cause bugs. Also, if it lacks comments, as it does in the article, then it's a great example of bad programming.
Furthermore, converting floating-points to integers and then doing calculations on them is a trick that has been known for years, long before Quake3.

All floating point operations are approximations, writing good floating point code is all about understanding the approximations and tradeoffs you are making.

and floading point operations can be one of the slowest parts of code

i have to agree on the lack of comments in the source for games, it can make them very hard to penatrate (particularlly when combined with heavy use of macros)

The inverse square root is 1/(sqrt(x)), not x^2 (which is, I admit, the first thing I thought of, wondering why anyone would be so excited about a faster way of getting it).

The inverse square root is 1/(sqrt(x)), not x^2 (which is, I admit, the first thing I thought of, wondering why anyone would be so excited about a faster way of getting it).

Well, you're sort of right.

f^-1(x) = x^2 is the inverse of f(x) = sqrt(x) where x > 0

The inverse of the square root is the square. The reciprocal of the square root is what is being calculated. In the case of multiplication I guess it's a reasonable name but it's pretty poor form in my view.

Re:Poor function name (1, Informative)

Anonymous Coward | more than 7 years ago | (#17070900)

The square root can be written as x^(1/2). The inverse square root can be written as 1/(x^(1/2)). This equals x^(-1/2).

As I said, the inverse of the square root of a number under the operation of multiplication is the reciprocal of the square root of the number. This function is named specifically because of that property.

But it's a function name. One would expect InvSqrt(n) to do the opposite that Sqrt(n) would do, i.e. squaring. RecipSqrt(n) would be a better name although I do admit it would take 2 more key presses!

Re:Poor function name (0)

Anonymous Coward | more than 7 years ago | (#17071218)

An easy counter-example is x=1. Your jump from step 2 to step 3 is incorrect in general. Also, step 2 fails to account for small values of 2.

Er... no. It's entirely possible for x^(-1/2) and x^(2) to be equal. This happens when x=1. (I know what you meant, but what you posted is very different.)

That page compares the time it takes to calculate the sqrt various ways including Carmacks. Short version is that modern processors are significantly faster since it can be done in hardware. It may still be useful in cases where the processor doesn't have the sqrt function available.

His version took 428 cycles compared to 107 cycles doing it in hardware on the same system.

Re:It was fast (0)

Anonymous Coward | more than 7 years ago | (#17071148)

That's just the square root, we're looking at calculating the inverse too.

First off, this function calculates 1.0/sqrt(x), not sqrt(x). InvSqrt is a particularily nasty function because both the divide and the square root stall the floating point pipeline on IA32 processors. As a result, instead of shooting out one result per cycle that the pipelining normally allows, the processor will stall for 32 cycles for the divide after it has stalled for the 43 cycles for the square root(P4). This is a big hit to realtime performance and it also prevents 76 multiplies from getting done while the pipeline is stalled. Secondly, IA32 processors are super scalar and have multiple integer units which can do portions of this calculation in parallel. This algorithm is brilliant because it uses the integer units for a portion of the most difficult part of the calculation and the remaining floating point multiplies only take about 6 clock cycles on the FPU.
The difference in clock cycles you are counting is likely because the routine as written will be implemented as a function call and the stack push overhead will eat you alive. If this is implemented inline, it's about 6 times as good as simply calling the processor's assembly instructions for root and divide in sequence with the penalty that it isn't as accurate. It is virtually impossible to beat sqrt on IA-32 but 1.0/sqrt can be computed faster with newton raphson iteration in one fell swoop than by coposition of the operations.
I've worked several years implementing similar optimizations in the reference implementation of ISO/IEC 18026, a standard for digital map conversion. Most of the routines that had optimizations like this added to them saw at least 30% speed improvements. This is a bit of a soft number because many things were reordered to make the pipeline fill better but in general, a complicated function especially of trig fucntions that can be computed in one iteration of well designed newton-raphson will be much faster than the coposition of the CPU's implementation of the component functions.
In short, don't write off careful numerics they can provide great sped improvements, just don't use them in code that people will want to understand later if you don't document exactly what you did and why.

This was a great article. Especially for the script kiddies and wanna be HaXoRz.
It's great to enlighten others about the art of true technology.
Artcle included the following:
* Mystery (who wrote this)
* A Puzzle to solve (how did they come up with some a clevery solution)
* Games (Quake, Doom, 3D graphics)
* Math (Inverse Square-Root)
* History (Who's who of 3D programming, VooDoo->3dFx->nVidia)
* Lessons (Skill, Assembly Language, Research, OpenSource is good)
Kinda brought back feelings of the "Revenge of the Geeks" series by PBS, when you get to place faces, names, ideas, feelings and thoughts to technical acheivements.

Mirrordot the airticle cut-and-paste (5, Informative)

This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy. Origin of Quake3's Fast InvSqrt()

To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense.

InvSqrt()

Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist. How the code works

The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it.

Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU.

But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation. So who did write it? John Carmack?

While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time.

Asking John whether it was him or Michael returned a "not quite".

-----Original Message----- From: John Carmack Sent: 26 April 2004 19:51 Subject: Re: Origin of fast approximated inverse square root

At 06:38 PM 4/26/2004 +0100, you wrote:

>Hi John, > >There's a discussion on Beyond3D.com's forums about who the author of >the following is: > >float InvSqrt (float x){ > float xhalf = 0.5f*x; > int i = *(int*) > i = 0x5f3759df - (i>>1); > x = *(float*) > x = x*(1.5f - xhalf*x*x); > return x; >} > >Is that something we can attribute to you? Analysis shows it to be >extremely clever in its method and supposedly from the Q3 source. >Most people say it's your work, a few say it's Michael Abrash's. Do >you know who's responsible, possibly with a history of sorts?

Not me, and I don't think it is Michael. Terje Matheson perhaps?

John Carmack

Despite having the noodle for it, John says nay and isn't sure if it's Michael either. The rebuttal was posted in the B3D thread along with Terje as a potential author and was then pretty much forgotten about by all until recently. During John's Quakecon keynote speech this year he mentioned the opening of the complete Quake3 v1.32 source under the General Public License, including the 3D renderer, to big cheers from the assembled crowd. With Doom3 recently finished and published to critical acclaim, hacking minds turned to id to ask when Q3's fairly ancient (by current 3D standards) renderer would be available for people to look at, learn from and work with.

Duly released in the week following Quakecon, Slashdot picked up the obvious story where the question of who wrote that implemenation of fast inverse square root came up again. Shortly after posting that I wondered why the hell I hadn't asked Terje!

Terje Mathisen, along with Michael Abrash and a few others, stands as one of the masters of assembly language optimisation for x86 microprocessors. Back in the days of software optimisation for 3D graphics, guys like Michael and Terje (and John, who's no slouch at it himself if you peek at the Quake source) would spend significant time testing hand-coded assembly optimisations for various critical-to-performance codes. The investment, one you wouldn't really make these days except in very special cases, paid off back in Doom and Quake's days. And if you hang around comp.lang.asm.x86 enough you'll spot Terje offering advice, optimisations, anecdotes and code snippets related to x86 assembly programming. The man knows his stuff. So, Terje, was it you? Terje, any ideas?

-----Original Message----- From: Terje Mathisen Sent: 22 August 2005 07:49 Subject: Re: FW: Origin of fast approximated inverse square root

ryszard wrote:

> Hey Terje, > > This question has come up again since id released the source to Quake > 3 Arena. > > Are you the guy who wrote that fast implementation of inverse square root? > If so, do you have a history of where it came from and how you came up > with it? A whole bunch of hackers and geeks would love to know and > since John says it wasn't him or likely Michael, was it you?

Hello Ryszard, and hello again John, it's been a few years since we last met.:-(

Thanks for giving me as a possible author, when I first saw the subject I did indeed think it was some of my code that had been used.:-)

I wrote a very fast (pipelineable) & accurate invsqrt() 5+ years ago, to help a Swede with a computational fluid chemistry problem.

His simulation runs used to take about a week on either Alpha or x86 systems, with my modifications they ran in half the time, while delivering the exact same final printed results (8-10 significant digits).

The code shown below is not the same as what I wrote, I would guess it mostly stays within a fraction of a percent? The swede needed at least 48 sigificant bits in his results, so I employed a much more straightforward table lookup plus NR-iteration. Since water molecules contain three atoms it was quite straightforward to calculate three such invsqrt() values in parallel, this was enough to avoid almost all bubbles in the fp pipelines.

I do think I recognize the style of the Q3A code however, it looks a lot like something you'll find in the old HAKMEM documents from MIT.:-)

Regards,

Terje "almost all programming can be viewed as an exercise in caching"

Terje's not our man, but he does show he's got the cojones for it. His own assembly version of invsqrt, accurate to 48 significant bits and pipelined on the x86 CPUs of 2000 - which had just gained SSE about a year before with A80525 (I'll let the geeks look that one up, KNI ring any bells?) by the way - uses a LUT and Newton-Raphson to maintain the precision his friend needed. A little analysis of his LUT would likely have seen him close to the InvSqrt() this article refers to. You might also remember Terje as one of the main guys behind public analysis of the original Pentium's FDIV bug.

While he can't pin down the author, he does drop more crumbs. I first came across the M.I.T. HACKMEM documents ages ago as I dove deep into low-level programming around the 1998-2000 timeframe. I was a wet-behind-the-ears hacker interested in 3D graphics back then, and my grasp of vector math was decent (and now sucks relatively speaking where I'm constantly having minor Eureka moments as I write about 3D hardware for a living) and I even tried my hand at a Linux driver for the S3 Savage3D around the same time John was working on the Utah GLX project with Matrox hardware.

Drawing tris at a low level wasn't such a big deal 5 or 6 years ago, since there was no programmable hardware and only the basic OpenGL pipeline to follow. Getting hardware specs out of the IHVs was also easy enough. I keep my copy of the Savage3D's register and hardware spec close by to remind me how unlikely it would ever be to get the same documents for a modern GPU from the big IHVs. More digging

With John and Terje unwilling to stake a claim on the code and Abrash mostly out of the running, more digging was required. Google, even with search-fu to make Larry and Sergey proud, didn't want to give up the goods. It was willing to provide some pointers towards NVIDIA though, with a posting to a Slashdot article that hinted that someone in Santa Clara was responsible.

A quick email to a friend at NVIDIA said that the T in SST, one Gary Tarolli, was the one at NVIDIA most likely to know. In almost every tale of 3D legend or lore you'll find 3dfx

For those new to 3D graphics, Gary Tarolli is one of the founders of the late 3dfx. One of the early pioneers of consumer 3D graphics hardware, 3dfx blazed a trail in 3D that started with coin-op arcade hardware, before the SST1 and Voodoo Graphics paired separate framebuffer and texture unit chips on a PCI board for the PC in late 1996. Voodoo 2 followed in 1997, powered by the brand new SST96 at arguably 3dfx's peak. Before the V2, 3dfx had no real competition in the consumer space, with NVIDIA's NV3 (Riva128) not quick enough and Rendition's Verite architecture not getting the market share needed to have the company ponder a follow up.

Voodoo3 arrived in 1997 after SLI Voodoo2s had ruled the roost for so long, and 3dfx dropped the SST prefix for their chips. The forgettable Voodoo3 was slapped around by NVIDIA and NV5 (TNT2 Ultra) and then NV10 barely half a year later (the first GeForce 256). On the ropes, the 3dfx VSA-10x architecture saw Voodoo4 and 5 hit the market in 2000, but even the monster Voodoo5 5500 wasn't enough to keep 3dfx afloat. Some business decisions gone bad, and problems introducing the Voodoo5 6000, saw NVIDIA buy the ailing 3D chip company, bringing the legend to a close.

Mentioning Rampage, Sage and Spectre today is enough to widen eyes and get saliva glands working overdrive in 3D geeks. 3dfx's stillborn next generation parts were to be the company's saviours. That wasn't to be, sadly, and most of 3dfx's staff were assimilated into NVIDIA, with the rest joining the likes of ATI or leaving 3D altogether to persue other careers.

So, with Mr. T-Buffer the next person likely to know where the code comes from, I decided to ask him not if he knew, but if it was him. Known as a coder, his simulation code is what brought up SST1 and SST96 on their design hardware before production. His ability to hack made it a valid question to ask. Must be you, Gary, surely?

From: Gary Tarolli Sent: Mon 05/09/2005 14:23 Subject: RE: FW: Origin of fast approximated inverse square root

A blast from the past!

I definitely recognize the code below, but I can't take credit for it. I remember running across it over 10 years ago, and I also remember rederiving it. I think it's just Newton-Raphson iteration with a very clever first approx.

I also remember simulating different values for the hex constant 0x5f3759df. I may have done this for the IRIS indigo work I did, or some consulting at Kubota, I'm not 100% sure.

Given the amount of math it does, and its accuracy, and not requiring a table, it is a pretty great piece of code.

I especially like the integer ops
i = 0x5f3759df - (i >> 1); which actually is doing a floating point computation in integer - it took a long time to figure out how and why this works, and I can't remember the details anymore.

Ah those were the days - fast integer and slow floating point....

So it did pass by my keyboard many many years ago, I may have tweaked the hex constant a bit or so, but other than that I can't take credit for it, except that I used it a lot and probably contributed to its popularity and longevity.

p.s. sorry in taking so long to reply

We'll forgive Gary for taking so long to reply since we pretty much hit the jackpot. While he can't take all the credit, he's definitely one of the guys responsible for it and likely back when he worked at Silicon Graphics on the Indigo. At one point in the past, around 2001, I actually owned an R4K with 4 XS24Zs (Elan-class since it had a depth buffer chip on the graphics boards and 4 geometry engines!) Indigo for a short while. Now you know where 3dfx got their multi-chip 3D ideas from:grin:

The R4K stands out as an SGI box to actually use commodity PC parts in its construction, which helped me get it up and running after I bought it non-working for a collection of old SGI hardware that never really got off the ground. Those things are still pretty pricey to this day, even moreso than 3 or 4 years ago!

What software Gary did for the Indigo project I've still to ask him, but it's likely core IRIX work on the code to drive the 3D hardware. His journey from SGI to NVIDIA via 3dfx is a common one with many current NVIDIA employees having made a similar journey. Maybe most notably is Emmett Kilgariff, the man who's likely responsible for a large chunk of NV40 and G70's performance with the design of their fragment processors. All done and dusted?

While Gary can't take the full credit, he's likely the last person easily available that had a hand in writing and refining the fast inverse square root implementation that sparked this investigation. When it takes you via John Carmack, Michael Abrash and Terje Mathison, the source to a id-made Quake game, and then finally 3dfx, a 3D geek can't really complain. A really cool hack that deserves some limelight after all these years. It's over 15 years old at this point.

To all those wondering why John bothers to push out the source to id's game engines after the fact, the snippet of code at the very top of this article is a poster child for why. Not only do you get well-programmed and well-optimised 3D engines to modify and learn from, you get gems like the fast invsqrt function to show you that it's not all about the 3D hardware, and that software is arguably even more of a factor when analysing 3D performance.

So not all done and dusted in finding who wrote it, but maybe as close as it's likely to get. Hope you enjoyed the journey. Thanks

John Carmack Terje Mathison Gary Tarolli Chris Lomont DemoCoder Marco Salvi Comment

Feel like commenting? We'd love for you to do so here

I haven't seen code like this since I stopped coding Mandelbrot sets in 68k assembler. As a young newt I looked at the code inside fractint with awe and discovered some similar marvellous optimisations. Building on those and converting to 68k I made what was the fastest mandelbrot calculation I could.

float InvSqrt(float x) {
float xhalf = 0.5f*x;
int i = *(int*)
i = 0x5f3759df - (i >> 1);
x = *(float*)
x = x*(1.5f - xhalf*x*x);
return x;
}
Asking John whether it was him or Michael returned a "not quite".
But it's supposed to return a float!

Introduction
Note!
This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy.
Origin of Quake3's Fast InvSqrt()
To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense.
float InvSqrt (float x){
float xhalf = 0.5f*x;
int i = *(int*)
i = 0x5f3759df - (i>>1);
x = *(float*)
x = x*(1.5f - xhalf*x*x);
return x;
}
Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist.
How the code works
The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it.
Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU.
But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation.
So who did write it? John Carmack?
While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time.
Asking John whether it was him or Michael returned a "not quite".
-----Original Message-----
From: John Carmack
Sent: 26 April 2004 19:51
Subject: Re: Origin of fast approximated inverse square root
At 06:38 PM 4/26/2004 +0100, you wrote:
>Hi John,
>
>There's a discussion on Beyond3D.com's forums about who the author of
>the following is:
>
>float InvSqrt (float x){
> float xhalf = 0.5f*x;
> int i = *(int*)
> i = 0x5f3759df - (i>>1);
> x = *(float*)
> x = x*(1.5f - xhalf*x*x);
> return x;
>}
>
>Is that something we can attribute to you? Analysis shows it to be
>extremely clever in its method and supposedly from the Q3 source.
>Most people say it's your work, a few say it's Michael Abrash's. Do
>you know who's responsible, possibly with a history of sorts?
Not me, and I don't think it is Michael. Terje Matheson perhaps?
John Carmack
Despite having the noodle for it, John says nay and isn't sure if it's Michael either. The rebuttal was posted in the B3D thread along with Terje as a potential author and was then pretty much forgotten about by all until recently. During John's Quakecon keynote speech this year he mentioned the opening of the complete Quake3 v1.32 source under the General Public License, including the 3D renderer, to big cheers from the assembled crowd. With Doom3 recently finished and published to critical acclaim, hacking minds turned to id to ask when Q3's fairly ancient (by current 3D standards) renderer would be available for people to look at, learn from and work with.
Duly released in the week following Quakecon, Slashdot picked up the obvious story where the question of who wrote that implemenation of fast inverse square root came up again. Shortly after posting that I wondered why the hell I hadn't asked Terje!
Terje Mathisen, along with Michael Abrash and a few others, stands as one of the masters of assembly language optimisation for x86 microprocessors. Back in the days of software optimisation for 3D graphics, guys like Michael and Terje (and John, who's no slouch at it himself if you peek at the Quake source) would spend significant time testing hand-coded assembly optimisations for various critical-to-performance codes. The investment, one you wouldn't really make these days except in very special cases, paid off back in Doom and Quake's days. And if you hang around comp.lang.asm.x86 enough you'll spot Terje offering advice, optimisations, anecdotes and code snippets related to x86 assembly programming. The man knows his stuff. So, Terje, was it you?
Terje, any ideas?
-----Original Message-----
From: Terje Mathisen
Sent: 22 August 2005 07:49
Subject: Re: FW: Origin of fast approximated inverse square root
ryszard wrote:
> Hey Terje,
>
> This question has come up again since id released the source to Quake
> 3 Arena.
>
> Are you the guy who wrote that fast implementation of inverse square root?
> If so, do you have a history of where it came from and how you came up
> with it? A whole bunch of hackers and geeks would love to know and
> since John says it wasn't him or likely Michael, was it you?
Hello Ryszard, and hello again John, it's been a few years since we last met.:-(
Thanks for giving me as a possible author, when I first saw the subject I did
indeed think it was some of my code that had been used.:-)
I wrote a very fast (pipelineable) & accurate invsqrt() 5+ years ago, to help
a Swede with a computational fluid chemistry problem.
His simulation runs used to take about a week on either Alpha or x86 systems,
with my modifications they ran in half the time, while delivering the exact
same final printed results (8-10 significant digits).
The code shown below is not the same as what I wrote, I would guess it mostly
stays within a fraction of a percent? The swede needed at least 48 sigificant
bits in his results, so I employed a much more straightforward table lookup
plus NR-iteration. Since water molecules contain three atoms it was quite
straightforward to calculate three such invsqrt() values in parallel, this was
enough to avoid almost all bubbles in the fp pipelines.
I do think I recognize the style of the Q3A code however, it looks a lot like
something you'll find in the old HAKMEM documents from MIT.:-)
Regards,
Terje
"almost all programming can be viewed as an exercise in caching"
Terje's not our man, but he does show he's got the cojones for it. His own assembly version of invsqrt, accurate to 48 significant bits and pipelined on the x86 CPUs of 2000 - which had just gained SSE about a year before with A80525 (I'll let the geeks look that one up, KNI ring any bells?) by the way - uses a LUT and Newton-Raphson to maintain the precision his friend needed. A little analysis of his LUT would likely have seen him close to the InvSqrt() this article refers to. You might also remember Terje as one of the main guys behind public analysis of the original Pentium's FDIV bug.
While he can't pin down the author, he does drop more crumbs. I first came across the M.I.T. HACKMEM documents ages ago as I dove deep into low-level programming around the 1998-2000 timeframe. I was a wet-behind-the-ears hacker interested in 3D graphics back then, and my grasp of vector math was decent (and now sucks relatively speaking where I'm constantly having minor Eureka moments as I write about 3D hardware for a living) and I even tried my hand at a Linux driver for the S3 Savage3D around the same time John was working on the Utah GLX project with Matrox hardware.
Drawing tris at a low level wasn't such a big deal 5 or 6 years ago, since there was no programmable hardware and only the basic OpenGL pipeline to follow. Getting hardware specs out of the IHVs was also easy enough. I keep my copy of the Savage3D's register and hardware spec close by to remind me how unlikely it would ever be to get the same documents for a modern GPU from the big IHVs.
More digging
With John and Terje unwilling to stake a claim on the code and Abrash mostly out of the running, more digging was required. Google, even with search-fu to make Larry and Sergey proud, didn't want to give up the goods. It was willing to provide some pointers towards NVIDIA though, with a posting to a Slashdot article that hinted that someone in Santa Clara was responsible.
A quick email to a friend at NVIDIA said that the T in SST, one Gary Tarolli, was the one at NVIDIA most likely to know.
In almost every tale of 3D legend or lore you'll find 3dfx
For those new to 3D graphics, Gary Tarolli is one of the founders of the late 3dfx. One of the early pioneers of consumer 3D graphics hardware, 3dfx blazed a trail in 3D that started with coin-op arcade hardware, before the SST1 and Voodoo Graphics paired separate framebuffer and texture unit chips on a PCI board for the PC in late 1996. Voodoo 2 followed in 1997, powered by the brand new SST96 at arguably 3dfx's peak. Before the V2, 3dfx had no real competition in the consumer space, with NVIDIA's NV3 (Riva128) not quick enough and Rendition's Verite architecture not getting the market share needed to have the company ponder a follow up.
Voodoo3 arrived in 1997 after SLI Voodoo2s had ruled the roost for so long, and 3dfx dropped the SST prefix for their chips. The forgettable Voodoo3 was slapped around by NVIDIA and NV5 (TNT2 Ultra) and then NV10 barely half a year later (the first GeForce 256). On the ropes, the 3dfx VSA-10x architecture saw Voodoo4 and 5 hit the market in 2000, but even the monster Voodoo5 5500 wasn't enough to keep 3dfx afloat. Some business decisions gone bad, and problems introducing the Voodoo5 6000, saw NVIDIA buy the ailing 3D chip company, bringing the legend to a close.
Mentioning Rampage, Sage and Spectre today is enough to widen eyes and get saliva glands working overdrive in 3D geeks. 3dfx's stillborn next generation parts were to be the company's saviours. That wasn't to be, sadly, and most of 3dfx's staff were assimilated into NVIDIA, with the rest joining the likes of ATI or leaving 3D altogether to persue other careers.
So, with Mr. T-Buffer the next person likely to know where the code comes from, I decided to ask him not if he knew, but if it was him. Known as a coder, his simulation code is what brought up SST1 and SST96 on their design hardware before production. His ability to hack made it a valid question to ask.
Must be you, Gary, surely?
From: Gary Tarolli
Sent: Mon 05/09/2005 14:23
Subject: RE: FW: Origin of fast approximated inverse square root
A blast from the past!
I definitely recognize the code below, but I can't take credit for it.
I remember running across it over 10 years ago, and I also remember
rederiving it. I think it's just Newton-Raphson iteration with a very
clever first approx.
I also remember simulating different values for the hex constant
0x5f3759df. I may have done this for the IRIS indigo work I did,
or some consulting at Kubota, I'm not 100% sure.
Given the amount of math it does, and its accuracy, and not requiring
a table, it is a pretty great piece of code.
I especially like the integer ops
i = 0x5f3759df - (i >> 1);
which actually is doing a floating point computation in integer - it
took a long time to figure out how and why this works, and I can't
remember the details anymore.
Ah those were the days - fast integer and slow floating point....
So it did pass by my keyboard many many years ago, I may have tweaked
the hex constant a bit or so, but other than that I can't take credit
for it, except that I used it a lot and probably contributed to its
popularity and longevity.
p.s. sorry in taking so long to reply
We'll forgive Gary for taking so long to reply since we pretty much hit the jackpot. While he can't take all the credit, he's definitely one of the guys responsible for it and likely back when he worked at Silicon Graphics on the Indigo. At one point in the past, around 2001, I actually owned an R4K with 4 XS24Zs (Elan-class since it had a depth buffer chip on the graphics boards and 4 geometry engines!) Indigo for a short while. Now you know where 3dfx got their multi-chip 3D ideas from:grin:
The R4K stands out as an SGI box to actually use commodity PC parts in its construction, which helped me get it up and running after I bought it non-working for a collection of old SGI hardware that never really got off the ground. Those things are still pretty pricey to this day, even moreso than 3 or 4 years ago!
What software Gary did for the Indigo project I've still to ask him, but it's likely core IRIX work on the code to drive the 3D hardware. His journey from SGI to NVIDIA via 3dfx is a common one with many current NVIDIA employees having made a similar journey. Maybe most notably is Emmett Kilgariff, the man who's likely responsible for a large chunk of NV40 and G70's performance with the design of their fragment processors.
All done and dusted?
While Gary can't take the full credit, he's likely the last person easily available that had a hand in writing and refining the fast inverse square root implementation that sparked this investigation. When it takes you via John Carmack, Michael Abrash and Terje Mathison, the source to a id-made Quake game, and then finally 3dfx, a 3D geek can't really complain. A really cool hack that deserves some limelight after all these years. It's over 15 years old at this point.
To all those wondering why John bothers to push out the source to id's game engines after the fact, the snippet of code at the very top of this article is a poster child for why. Not only do you get well-programmed and well-optimised 3D engines to modify and learn from, you get gems like the fast invsqrt function to show you that it's not all about the 3D hardware, and that software is arguably even more of a factor when analysing 3D performance.
So not all done and dusted in finding who wrote it, but maybe as close as it's likely to get. Hope you enjoyed the journey.
Thanks
John Carmack
Terje Mathison
Gary Tarolli
Chris Lomont
DemoCoder
Marco Salvi

The article barely mentions
HAKMEM [wikipedia.org] , but the invsqrt hack is reminiscent of the HAKMEM
programming hacks [pipeline.com] , which were published in 1972. Several of these hacks use bit fiddling with magic constants to perform tasks in straight-line code, that you would ordinarily think of doing with iteration.

HAKMEM is classic bathroom reading for hackers. If you want to do it up old-school, print a copy from
original scans [mit.edu] , double-sided.

## A famous quote (5, Funny)

## ArchieBunker (132337) | more than 7 years ago | (#17070412)

## Re:A famous quote (5, Informative)

## GigsVT (208848) | more than 7 years ago | (#17070456)

## Re:A famous quote (5, Funny)

## Trigun (685027) | more than 7 years ago | (#17070470)

## Re:A famous quote (4, Informative)

## kill-1 (36256) | more than 7 years ago | (#17071198)

## Re:A famous quote (0)

## misleb (129952) | more than 7 years ago | (#17071640)

## Re:A famous quote (1)

## zzyzx (15139) | more than 7 years ago | (#17070488)

## Re:A famous quote (0)

## Anonymous Coward | more than 7 years ago | (#17071100)

## Re:A famous quote (5, Funny)

## mogrify (828588) | more than 7 years ago | (#17071472)

Interesting smiley... is that a dead man with a fraction in his mouth and a prominent Adam's Apple, wearing a bow tie and a dress and standing on a toy race car?

What's your point, man?

## Re:A famous quote (-1, Offtopic)

## DrDitto (962751) | more than 7 years ago | (#17070732)

Anyone who buys DRM'd music is either an idiot or ignorantI buy DRM music because I like iTunes and I don't give a rats ass about some geek religion. Nor am I arrogant enough to put this in my sig.

P.S., everything I buy on iTunes gets burned onto an Audio CD.

## Re:A famous quote (-1, Offtopic)

## Anonymous Coward | more than 7 years ago | (#17070460)

## Re:A famous quote (1)

## theelectron (973857) | more than 7 years ago | (#17070496)

Just to make sure, I did RTFA and it says it in the first paragraph or so.

## Re:A famous quote (1, Informative)

## Anonymous Coward | more than 7 years ago | (#17070514)

To find the square root of say, N, you use InvSqrt(N) and then multiply the result by N to get the square root of N. Supposedly, it's faster than using the built-in sqrt() function.

## Re:A famous quote (-1, Flamebait)

## creimer (824291) | more than 7 years ago | (#17070572)

## Re:A famous quote (1, Redundant)

## Codename46 (889058) | more than 7 years ago | (#17070620)

## Re:A famous quote (4, Informative)

## Red Flayer (890720) | more than 7 years ago | (#17070624)

Carmack quite graciously denied the code was his and helped direct the author closer to the true source.

## nope (1)

## rhombic (140326) | more than 7 years ago | (#17070646)

## Re:A famous quote (3, Informative)

## Raul654 (453029) | more than 7 years ago | (#17070604)

## Re:A famous quote (3, Informative)

## abradsn (542213) | more than 7 years ago | (#17071150)

## Re:A famous quote (1)

## nicklott (533496) | more than 7 years ago | (#17070904)

## Re: A better question: (2, Interesting)

## Anonymous Coward | more than 7 years ago | (#17070926)

Howdoes this function compute 1/x^(1/2)?"I'm asking, because I have no idea how it works.

## Re:A famous quote (0)

## Anonymous Coward | more than 7 years ago | (#17071192)

## Evil math--it approximates the inverse square root (2, Informative)

## Anonymous Coward | more than 7 years ago | (#17071314)

Seriously, they're using some evil numerical techniques that approximate the actual value of that function to withing a few percent very quickly. You could do more steps of Newton's root finding method or other complex things, but that would be slow and this is only meant to find things that are "good enough" for you to draw on someone's screen.

This function is used in graphics programming, specifically in things like the Quake 3 source code (as well as other places). First, it calculates x/2, then it gets the bits of the float x as an integer (without modifying them), then it does the "magic" step with the magic number they calculated via evil mathematics to give the best starting point they can for Newton's method using the x >> 1 to divide the float-as-integer in two (and make it positive, since your sign bit falls into the exponent of the float!

Don't mind me. I may have a degree in mathematics, and I even (somehow!) made it through at least one class on numerical analysis, but all I can tell you is that this stuff is evil

## Re:A famous quote (5, Informative)

## 91degrees (207121) | more than 7 years ago | (#17071420)

This is great for a 3D rendering application, but in a game speed is critical. This pair of calculations involves a square root and a divide. Both of thse are at least an order of magnitude slower than multiplications and additions.

So what this function does is provide a value you can multiply each component by to get a unit vector.

Well, there's the

whatandwhyparts. As for the , I have no idea. I think it uses magic.## Re:A famous quote (1, Troll)

## Lord Ender (156273) | more than 7 years ago | (#17071700)

Of course, most students have a hard time with that class, and can't actually remember how to

doanything after the course other than Monte-Carlo Integration. But you should at LEAST remember being exposed to Newtonian integration in your undergrad numerical methods class!The only thing amazing about this code is that someone paid good-enough attention in class to be able to apply what he learned to his job.The real genius here is Isaac Newton. 'course, that's not news to anyone.

## And so why do we care? (2, Insightful)

## Codename46 (889058) | more than 7 years ago | (#17070484)

[Insert rant about software patents]

## Re:And so why do we care? (4, Funny)

## From A Far Away Land (930780) | more than 7 years ago | (#17070510)

I was a little worried when Slashdot posted the Britney Spears beaver pictures, but they now have their credibility back as the home of "News for Nerds".

## Re:And so why do we care? (5, Funny)

## Anonymous Coward | more than 7 years ago | (#17070682)

## Re:And so why do we care? (1)

## i.r.id10t (595143) | more than 7 years ago | (#17070742)

However, if it really did get posted to

## Re:And so why do we care? (4, Funny)

## Pollardito (781263) | more than 7 years ago | (#17070858)

## Re:And so why do we care? (4, Funny)

## JabberWokky (19442) | more than 7 years ago | (#17071320)

--

Evan

## Re:And so why do we care? (0)

## Anonymous Coward | more than 7 years ago | (#17071652)

TaAAaa DAaAAAAaAaa!!!

## Re:And so why do we care? (0, Offtopic)

## Tx (96709) | more than 7 years ago | (#17070790)

## Re:And so why do we care? (5, Informative)

## Tx (96709) | more than 7 years ago | (#17070844)

## Re:And so why do we care? (0)

## Anonymous Coward | more than 7 years ago | (#17071120)

What's going on here [imagevenue.com] ?

## Re:And so why do we care? (0)

## Anonymous Coward | more than 7 years ago | (#17071638)

## Re:And so why do we care? (2, Funny)

## gstoddart (321705) | more than 7 years ago | (#17070756)

Hmmmm

## Re:And so why do we care? (1)

## zdzichu (100333) | more than 7 years ago | (#17071272)

NSFWand offtopic).## I know who wrote it (4, Funny)

## circletimessquare (444983) | more than 7 years ago | (#17070504)

## It's Funny. Laugh. (1)

## Alaren (682568) | more than 7 years ago | (#17070694)

## Re:It's Funny. Laugh. (1, Informative)

## Anonymous Coward | more than 7 years ago | (#17070962)

## Obviously SCO's intellectual property! (4, Funny)

## gukin (14148) | more than 7 years ago | (#17070584)

## This paper seems to have the info (5, Informative)

## nels_tomlinson (106413) | more than 7 years ago | (#17070626)

## Re:This paper seems to have the info (3, Informative)

## SkankinMonkey (528381) | more than 7 years ago | (#17070738)

## Re:This paper seems to have the info (4, Funny)

## Toasty16 (586358) | more than 7 years ago | (#17071202)

nowI get it... [Looks furtively around to see if anyone knows he doesn't get it]## What's with use of Pointers? (2, Interesting)

## MankyD (567984) | more than 7 years ago | (#17070630)

Can someone enlighten me?

## Re:What's with use of Pointers? (1)

## MankyD (567984) | more than 7 years ago | (#17070674)

## Re:What's with use of Pointers? (1)

## chrisbtoo (41029) | more than 7 years ago | (#17070852)

With your suggested code, you'd be operating on the actual number, casting the float to an int, which would just be throwing away a bunch of information.

## Re:What's with use of Pointers? (5, Informative)

## eis271828 (842849) | more than 7 years ago | (#17070866)

*(int*) &x treats the bits as an integer, with no behind the scenes conversion to an actual int value.

## Re:What's with use of Pointers? (4, Informative)

## radarsat1 (786772) | more than 7 years ago | (#17070878)

If you do this:

int i = (int)3.0f;You get i=3, like what you'd get from the floor() function.

If you do this:

float f = 3.0f;int i = *(int*)

Then i contains a bit-for-bit copy of the IEEE floating-point representation of 3.0.

It's because C knows how to cast a float to an int by applying the floor function. However, if you do it the second way, you aren't casting a float to an int, you are casting a pointer-to-float to a pointer-to-int and then dereferencing it.

By the way, I just wanted to say... this is one of the most interesting things I've read on Slashdot in a while. Wow. That function is just amazing. I only wish I understood how it worked. I know nothing about what a "Newton-Raphson iteration" is.

## Re:What's with use of Pointers? (4, Informative)

## dsci (658278) | more than 7 years ago | (#17071400)

You start with some INITIAL GUESS (the real beauty of this algorithm) X(0), then apply:

X(n+1) = X(n) - f(X(n)) / f'(X(n))

where

X(n+1) is the NEXT guess after the value you 'know',

X(n) is that most recent value you know,

f(X(n)) is the function evaluated at X(n) and

f'(X(n)) is the first derivative of f(x) evaluated at X(n).

It's not foolproof and a BOTH whether it converges at al AND how FAST it converges depends on the initial guess, X(0)

The "Secant Method" is an improvement that makes it a little 'smarter,' at the expense of more computation (this is often a positive trade-off on numerical modeling codes, since the 'smarter' algorithm does tend to converge faster). There are other improvements as well, such as the Los Alamos Linear Feedback Solver (a slightly modified secant method that converges about 10-17% faster, at least for some types of problems) that I use in my own codes.

Obligatory wikipediea followup: Newton's Method [wikipedia.org]

## Re:What's with use of Pointers? (2, Informative)

## The boojum (70419) | more than 7 years ago | (#17071556)

Newton-Raphson is a root finding method. Given a starting point it finds successively more accurate approximations to the root. The formula for it looks like: x_{n+1} = x_n - f(x_n)/f'(x_n), where f(x) is the function to find the root of, f'(x_n) is the derivative of that function and x_n are the successive approximations. Essentially, given a guess it looks at the slope of the function at that point follows a tangent line until it crosses the zero line. That point becomes the next guess.

Finding the reciprocal square root of a is equivalent to solving x^{-2} - a = 0 which means that you use f(x) = x^{-2} - a. The derivative of that is f'(x) = -2x^{-3}. Putting those into the Newton-Raphson formula gives you x_{n+1} = x_n - (x_n^{-2} - a) / -2x_n^{-3}, which simplifies down to x_{n+1} = x_n(3 - a*x_n^2)/2. This is equivalent to the last line before the return in that InvSqrt() function.

Convergence of Newton-Raphson is quadratic which means that each time you repeat that line with a previous guess you double the number of digits of accuracy (up to the limit of the hardware of course).

## Re:What's with use of Pointers? (-1, Redundant)

## eneville (745111) | more than 7 years ago | (#17070908)

## Re:What's with use of Pointers? (0)

## eneville (745111) | more than 7 years ago | (#17070982)

excuse me. i did not RTFA, it was offline. i assumed x was also an intplease excuse this. mod parent down if you have points please.## Re:What's with use of Pointers? (1)

## The boojum (70419) | more than 7 years ago | (#17070916)

## Re:What's with use of Pointers? (5, Informative)

## ToxikFetus (925966) | more than 7 years ago | (#17071070)

I'll take a swing at this one. It's because the author doesn't want the value of x, but the integer representation of the value at x's memory address.

If x is 3.14159, (2) will result in i==3, whereas (1) will result in whatever the 4-byte IEEE-754 representation of 3.14159 is (0x40490FD0, if Google is correct). By using (1), the author is able to use integer bitwise opeartions (>>) to perform "free" floating point operations. When i is sent back into floating point form via:

x = *(float*)

x now contains the value of the integer operation:

i = 0x5f3759df - (i >> 1);

which was presumably faster than an identical floating point operation. It's a nifty little solution, especially with regard to the selection of the magic number.

## Re:What's with use of Pointers? (1)

## mooingyak (720677) | more than 7 years ago | (#17071160)

One is 'take this set of bytes, assume it's integer data, and tell me what value you get when you dereference (their version).

They won't necessarily render the same result.

I haven't looked at the code, but depending on the type of X, it could provide some wildly different results.

## Re:What's with use of Pointers? (1, Insightful)

## Anonymous Coward | more than 7 years ago | (#17070762)

converts a float to an int, potentially cutting off the digits past the decimal point and/or leading digits that don't fit into the 32 bit integer range.

> int i = *(int*);

reinterpretes the bit pattern of the float number as an integer, so the exponent and the sign bit become part of the number.

## Re:What's with use of Pointers? (2, Insightful)

## MeanMF (631837) | more than 7 years ago | (#17070786)

int i = *(int*)x assigns i the 32-bit value that represents how x is stored in memory as a float.

## Re:What's with use of Pointers? (1)

## Second_Derivative (257815) | more than 7 years ago | (#17070804)

I think it's a quick hack to halve the exponent and linearly extrapolate a new value for the mantissa.

## Re:What's with use of Pointers? (5, Informative)

## ewhac (5844) | more than 7 years ago | (#17070820)

int i = (int)x;Then C will simply convert the float value into an integer value (throwing away fractional part). But this isn't what we want. We want to operate on the bits of an IEEE floating point value directly, and integers are the best way to do that.

So first, we lie to the compiler by telling it we have a pointer to an int:

(int *) &fAnd then we deference the pointer to get it into an operable int:

i = *(int *) &fNote what's important here is to keep the compiler from modifying any part of the original 32-bit value.

Schwab

## Re:What's with use of Pointers? (2, Informative)

## Geoffreyerffoeg (729040) | more than 7 years ago | (#17071206)

Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.Moreover, when compiled, the optimizer does the right thing - since access to the variable is actually a memory dereference (or a register access, but we'll ignore it for now),

*(int*)&means "tell the compiler this is an int, but don't do anything in the code", whereas(int)means "create a new temporary that's the int part of the variable".## Re:What's with use of Pointers? (1)

## uhoreg (583723) | more than 7 years ago | (#17070898)

## Re:What's with use of Pointers? (1)

## uhoreg (583723) | more than 7 years ago | (#17071092)

## Re:What's with use of Pointers? (1)

## ars (79600) | more than 7 years ago | (#17070906)

## Because that's what he wanted to do. (1)

## mcg1969 (237263) | more than 7 years ago | (#17070914)

representationof the floating-point number. That is, it is extracting the raw bits involved.## Re:What's with use of Pointers? (0)

## Anonymous Coward | more than 7 years ago | (#17070918)

## Re:What's with use of Pointers? (1)

## doshell (757915) | more than 7 years ago | (#17070930)

Since

xis afloat,int i = (int)x;would find the (best) integer approximation ofxand store it ini. What the author meant to do was take the binary representation ofxand have the compiler treat it as the binary representation of anint-- thus the "taking the address", followed by a cast toint *, followed by a dereference to store the actual value ini.If this gets you confused, I'd definitely suggest trying out both methods in your favorite compiler :)

## Re:What's with use of Pointers? (0)

## Anonymous Coward | more than 7 years ago | (#17071028)

## Re:What's with use of Pointers? (1)

## systemeng (998953) | more than 7 years ago | (#17071048)

## Re:What's with use of Pointers? (1)

## 91degrees (207121) | more than 7 years ago | (#17071518)

Other people have answered this, so I'll ask another question.

Isn't it a little clearer, and potentially faster (at least if your optimiser doesn't understand) to use

union

{

float f;

int i;

}value;

value.f = x;

i = value.i;

## Bad programmer, no cookie (-1, Troll)

## Anonymous Coward | more than 7 years ago | (#17070700)

## Re:Bad programmer, no cookie (1)

## petermgreen (876956) | more than 7 years ago | (#17071308)

All floating point operations are approximations, writing good floating point code is all about understanding the approximations and tradeoffs you are making.

and floading point operations can be one of the slowest parts of code

i have to agree on the lack of comments in the source for games, it can make them very hard to penatrate (particularlly when combined with heavy use of macros)

## x * x, right? (0, Redundant)

## h4ter (717700) | more than 7 years ago | (#17070766)

## Re:x * x, right? (2, Informative)

## Eudial (590661) | more than 7 years ago | (#17071086)

Well, you're sort of right.

f^-1(x) = x^2 is the inverse of f(x) = sqrt(x) where x > 0

## Old and busted: Duff's device (3, Funny)

## Stavr0 (35032) | more than 7 years ago | (#17070778)

Naah, just kidding. They both deserve a spot in the Clever Hacks Hall of Fame

## InvSquirt!? (0, Offtopic)

## paniq (833972) | more than 7 years ago | (#17070780)

## InvSquirt!?-Sperm. (1, Funny)

## Anonymous Coward | more than 7 years ago | (#17071012)

## excellent thread (0)

## Anonymous Coward | more than 7 years ago | (#17070792)

that just goes to show you tweeds in VM land that you need to get down and dirty sometimes.

## Hmm... (2, Funny)

## porkmusket (954006) | more than 7 years ago | (#17070798)

## Re:Hmm... (2, Funny)

## gstoddart (321705) | more than 7 years ago | (#17071044)

Ummmm

## Poor function name (4, Insightful)

## bloobloo (957543) | more than 7 years ago | (#17070800)

## Re:Poor function name (1, Informative)

## Anonymous Coward | more than 7 years ago | (#17070900)

(-1/2) != (2).

x^(-1/2) != x^(2).

QED.

## Re:Poor function name (1)

## bloobloo (957543) | more than 7 years ago | (#17071118)

But it's a function name. One would expect InvSqrt(n) to do the opposite that Sqrt(n) would do, i.e. squaring. RecipSqrt(n) would be a better name although I do admit it would take 2 more key presses!

## Re:Poor function name (0)

## Anonymous Coward | more than 7 years ago | (#17071218)

## Re:Poor function name (3, Funny)

## truedfx (802492) | more than 7 years ago | (#17071242)

## It was fast (3, Informative)

## KalvinB (205500) | more than 7 years ago | (#17070808)

That page compares the time it takes to calculate the sqrt various ways including Carmacks. Short version is that modern processors are significantly faster since it can be done in hardware. It may still be useful in cases where the processor doesn't have the sqrt function available.

His version took 428 cycles compared to 107 cycles doing it in hardware on the same system.

## Re:It was fast (0)

## Anonymous Coward | more than 7 years ago | (#17071148)

## Re:It was fast (5, Insightful)

## systemeng (998953) | more than 7 years ago | (#17071500)

## Cool Journey (1)

## locotx (559059) | more than 7 years ago | (#17070958)

## Mirrordot the airticle cut-and-paste (5, Informative)

## zoftie (195518) | more than 7 years ago | (#17070964)

Note!

This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy.

Origin of Quake3's Fast InvSqrt()

To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense.

InvSqrt()

Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist.

How the code works

The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it.

Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU.

But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation.

So who did write it? John Carmack?

While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time.

Asking John whether it was him or Michael returned a "not quite".

-----Original Message-----

From: John Carmack

Sent: 26 April 2004 19:51

Subject: Re: Origin of fast approximated inverse square root

At 06:38 PM 4/26/2004 +0100, you wrote:

>Hi John,

>

>There's a discussion on Beyond3D.com's forums about who the author of

>the following is:

>

>float InvSqrt (float x){

> float xhalf = 0.5f*x;

> int i = *(int*)

> i = 0x5f3759df - (i>>1);

> x = *(float*)

> x = x*(1.5f - xhalf*x*x);

> return x;

>}

>

>Is that something we can attribute to you? Analysis shows it to be

>extremely clever in its method and supposedly from the Q3 source.

>Most people say it's your work, a few say it's Michael Abrash's. Do

>you know who's responsible, possibly with a history of sorts?

Not me, and I don't think it is Michael. Terje Matheson perhaps?

John Carmack

Despite having the noodle for it, John says nay and isn't sure if it's Michael either. The rebuttal was posted in the B3D thread along with Terje as a potential author and was then pretty much forgotten about by all until recently. During John's Quakecon keynote speech this year he mentioned the opening of the complete Quake3 v1.32 source under the General Public License, including the 3D renderer, to big cheers from the assembled crowd. With Doom3 recently finished and published to critical acclaim, hacking minds turned to id to ask when Q3's fairly ancient (by current 3D standards) renderer would be available for people to look at, learn from and work with.

Duly released in the week following Quakecon, Slashdot picked up the obvious story where the question of who wrote that implemenation of fast inverse square root came up again. Shortly after posting that I wondered why the hell I hadn't asked Terje!

Terje Mathisen, along with Michael Abrash and a few others, stands as one of the masters of assembly language optimisation for x86 microprocessors. Back in the days of software optimisation for 3D graphics, guys like Michael and Terje (and John, who's no slouch at it himself if you peek at the Quake source) would spend significant time testing hand-coded assembly optimisations for various critical-to-performance codes. The investment, one you wouldn't really make these days except in very special cases, paid off back in Doom and Quake's days. And if you hang around comp.lang.asm.x86 enough you'll spot Terje offering advice, optimisations, anecdotes and code snippets related to x86 assembly programming. The man knows his stuff. So, Terje, was it you?

Terje, any ideas?

-----Original Message-----

From: Terje Mathisen

Sent: 22 August 2005 07:49

Subject: Re: FW: Origin of fast approximated inverse square root

ryszard wrote:

> Hey Terje,

>

> This question has come up again since id released the source to Quake

> 3 Arena.

>

> Are you the guy who wrote that fast implementation of inverse square root?

> If so, do you have a history of where it came from and how you came up

> with it? A whole bunch of hackers and geeks would love to know and

> since John says it wasn't him or likely Michael, was it you?

Hello Ryszard, and hello again John, it's been a few years since we last met.

Thanks for giving me as a possible author, when I first saw the subject I did

indeed think it was some of my code that had been used.

I wrote a very fast (pipelineable) & accurate invsqrt() 5+ years ago, to help

a Swede with a computational fluid chemistry problem.

His simulation runs used to take about a week on either Alpha or x86 systems,

with my modifications they ran in half the time, while delivering the exact

same final printed results (8-10 significant digits).

The code shown below is not the same as what I wrote, I would guess it mostly

stays within a fraction of a percent? The swede needed at least 48 sigificant

bits in his results, so I employed a much more straightforward table lookup

plus NR-iteration. Since water molecules contain three atoms it was quite

straightforward to calculate three such invsqrt() values in parallel, this was

enough to avoid almost all bubbles in the fp pipelines.

I do think I recognize the style of the Q3A code however, it looks a lot like

something you'll find in the old HAKMEM documents from MIT.

Regards,

Terje

"almost all programming can be viewed as an exercise in caching"

Terje's not our man, but he does show he's got the cojones for it. His own assembly version of invsqrt, accurate to 48 significant bits and pipelined on the x86 CPUs of 2000 - which had just gained SSE about a year before with A80525 (I'll let the geeks look that one up, KNI ring any bells?) by the way - uses a LUT and Newton-Raphson to maintain the precision his friend needed. A little analysis of his LUT would likely have seen him close to the InvSqrt() this article refers to. You might also remember Terje as one of the main guys behind public analysis of the original Pentium's FDIV bug.

While he can't pin down the author, he does drop more crumbs. I first came across the M.I.T. HACKMEM documents ages ago as I dove deep into low-level programming around the 1998-2000 timeframe. I was a wet-behind-the-ears hacker interested in 3D graphics back then, and my grasp of vector math was decent (and now sucks relatively speaking where I'm constantly having minor Eureka moments as I write about 3D hardware for a living) and I even tried my hand at a Linux driver for the S3 Savage3D around the same time John was working on the Utah GLX project with Matrox hardware.

Drawing tris at a low level wasn't such a big deal 5 or 6 years ago, since there was no programmable hardware and only the basic OpenGL pipeline to follow. Getting hardware specs out of the IHVs was also easy enough. I keep my copy of the Savage3D's register and hardware spec close by to remind me how unlikely it would ever be to get the same documents for a modern GPU from the big IHVs.

More digging

With John and Terje unwilling to stake a claim on the code and Abrash mostly out of the running, more digging was required. Google, even with search-fu to make Larry and Sergey proud, didn't want to give up the goods. It was willing to provide some pointers towards NVIDIA though, with a posting to a Slashdot article that hinted that someone in Santa Clara was responsible.

A quick email to a friend at NVIDIA said that the T in SST, one Gary Tarolli, was the one at NVIDIA most likely to know.

In almost every tale of 3D legend or lore you'll find 3dfx

For those new to 3D graphics, Gary Tarolli is one of the founders of the late 3dfx. One of the early pioneers of consumer 3D graphics hardware, 3dfx blazed a trail in 3D that started with coin-op arcade hardware, before the SST1 and Voodoo Graphics paired separate framebuffer and texture unit chips on a PCI board for the PC in late 1996. Voodoo 2 followed in 1997, powered by the brand new SST96 at arguably 3dfx's peak. Before the V2, 3dfx had no real competition in the consumer space, with NVIDIA's NV3 (Riva128) not quick enough and Rendition's Verite architecture not getting the market share needed to have the company ponder a follow up.

Voodoo3 arrived in 1997 after SLI Voodoo2s had ruled the roost for so long, and 3dfx dropped the SST prefix for their chips. The forgettable Voodoo3 was slapped around by NVIDIA and NV5 (TNT2 Ultra) and then NV10 barely half a year later (the first GeForce 256). On the ropes, the 3dfx VSA-10x architecture saw Voodoo4 and 5 hit the market in 2000, but even the monster Voodoo5 5500 wasn't enough to keep 3dfx afloat. Some business decisions gone bad, and problems introducing the Voodoo5 6000, saw NVIDIA buy the ailing 3D chip company, bringing the legend to a close.

Mentioning Rampage, Sage and Spectre today is enough to widen eyes and get saliva glands working overdrive in 3D geeks. 3dfx's stillborn next generation parts were to be the company's saviours. That wasn't to be, sadly, and most of 3dfx's staff were assimilated into NVIDIA, with the rest joining the likes of ATI or leaving 3D altogether to persue other careers.

So, with Mr. T-Buffer the next person likely to know where the code comes from, I decided to ask him not if he knew, but if it was him. Known as a coder, his simulation code is what brought up SST1 and SST96 on their design hardware before production. His ability to hack made it a valid question to ask.

Must be you, Gary, surely?

From: Gary Tarolli

Sent: Mon 05/09/2005 14:23

Subject: RE: FW: Origin of fast approximated inverse square root

A blast from the past!

I definitely recognize the code below, but I can't take credit for it.

I remember running across it over 10 years ago, and I also remember

rederiving it. I think it's just Newton-Raphson iteration with a very

clever first approx.

I also remember simulating different values for the hex constant

0x5f3759df. I may have done this for the IRIS indigo work I did,

or some consulting at Kubota, I'm not 100% sure.

Given the amount of math it does, and its accuracy, and not requiring

a table, it is a pretty great piece of code.

I especially like the integer ops

i = 0x5f3759df - (i >> 1);

which actually is doing a floating point computation in integer - it

took a long time to figure out how and why this works, and I can't

remember the details anymore.

Ah those were the days - fast integer and slow floating point....

So it did pass by my keyboard many many years ago, I may have tweaked

the hex constant a bit or so, but other than that I can't take credit

for it, except that I used it a lot and probably contributed to its

popularity and longevity.

p.s. sorry in taking so long to reply

We'll forgive Gary for taking so long to reply since we pretty much hit the jackpot. While he can't take all the credit, he's definitely one of the guys responsible for it and likely back when he worked at Silicon Graphics on the Indigo. At one point in the past, around 2001, I actually owned an R4K with 4 XS24Zs (Elan-class since it had a depth buffer chip on the graphics boards and 4 geometry engines!) Indigo for a short while. Now you know where 3dfx got their multi-chip 3D ideas from

The R4K stands out as an SGI box to actually use commodity PC parts in its construction, which helped me get it up and running after I bought it non-working for a collection of old SGI hardware that never really got off the ground. Those things are still pretty pricey to this day, even moreso than 3 or 4 years ago!

What software Gary did for the Indigo project I've still to ask him, but it's likely core IRIX work on the code to drive the 3D hardware. His journey from SGI to NVIDIA via 3dfx is a common one with many current NVIDIA employees having made a similar journey. Maybe most notably is Emmett Kilgariff, the man who's likely responsible for a large chunk of NV40 and G70's performance with the design of their fragment processors.

All done and dusted?

While Gary can't take the full credit, he's likely the last person easily available that had a hand in writing and refining the fast inverse square root implementation that sparked this investigation. When it takes you via John Carmack, Michael Abrash and Terje Mathison, the source to a id-made Quake game, and then finally 3dfx, a 3D geek can't really complain. A really cool hack that deserves some limelight after all these years. It's over 15 years old at this point.

To all those wondering why John bothers to push out the source to id's game engines after the fact, the snippet of code at the very top of this article is a poster child for why. Not only do you get well-programmed and well-optimised 3D engines to modify and learn from, you get gems like the fast invsqrt function to show you that it's not all about the 3D hardware, and that software is arguably even more of a factor when analysing 3D performance.

So not all done and dusted in finding who wrote it, but maybe as close as it's likely to get. Hope you enjoyed the journey.

Thanks

John Carmack

Terje Mathison

Gary Tarolli

Chris Lomont

DemoCoder

Marco Salvi

Comment

Feel like commenting? We'd love for you to do so here

## It Was Obviously... (3, Funny)

## eno2001 (527078) | more than 7 years ago | (#17071046)

## Fast functions (1)

## LiquidCoooled (634315) | more than 7 years ago | (#17071066)

As a young newt I looked at the code inside fractint with awe and discovered some similar marvellous optimisations.

Building on those and converting to 68k I made what was the fastest mandelbrot calculation I could.

Another blast from the past.

## Also in Jim Blinn's Corner (1)

## Matchstick (94940) | more than 7 years ago | (#17071068)

## It might be damn smart.. (5, Insightful)

## kan0r (805166) | more than 7 years ago | (#17071186)

Seriously, try looking away from the genius who obviously wrote it.

mucheasier!## Re:It might be damn smart.. (0)

## Anonymous Coward | more than 7 years ago | (#17071448)

## Error! (3, Funny)

## jrmiller84 (927224) | more than 7 years ago | (#17071264)

float InvSqrt(float x) { float xhalf = 0.5f*x; int i = *(int*) i = 0x5f3759df - (i >> 1); x = *(float*) x = x*(1.5f - xhalf*x*x); return x; } Asking John whether it was him or Michael returned a "not quite".But it's supposed to return a float!## Re:Error! (1)

## jrmiller84 (927224) | more than 7 years ago | (#17071524)

## Article Text (0, Redundant)

## AngusL (832347) | more than 7 years ago | (#17071384)

## Re:Article Text (1)

## AngusL (832347) | more than 7 years ago | (#17071484)

## hakmem (2, Interesting)

## trb (8509) | more than 7 years ago | (#17071680)

HAKMEM is classic bathroom reading for hackers. If you want to do it up old-school, print a copy from original scans [mit.edu] , double-sided.