The G4 and Apple's Second Coming
His math is accurate for a large-working set scenario. Memory bandwidth is the limiting factor for any non-trivial vector computing problem. If you look at the original Cray computers, the revolutionary part of the computer wasn't just the CPU, but also the very wide, interleaved memories that allowed it to fetch new data on every clock.
The key is what you consider a "real world application". If it's Quake III, then quite possibly the full set of floating point vectors would fit in a 1-2MB L2 cache. If it's scientific computing, then it drops back to memory bus rate.