No CUDA benchmarks online yet, but we can predict from SP count and MHz pretty well.
What makes me have a BIG SMILE is that power use is only about 150 watts under load, and temps are roughly 65 degrees C, not 90C.
This is very very exciting because it means that a two GF104 chip card is within practical power and heat limits. I am now camping in line at the local NVIDIA store for the now inevitable GTX495, please!
Seems like they added the ability to issue multiple instructions per-cycle from the same warp, and also added in an extra set of functional units per SM.
Tech Report, AnandTech, FiringSquad, Guru3D, Hardware Canucks, Hardware Heaven, [H]ard|OCP, Hexus.net, HotHardware, and PC Perspective all have GTX460 reviews.
None of them do a single CUDA test, not even Folding@Home. Sigh.
So with the 50% extra cores we can expect it to perform like a GF100 with 336 SP’s sometimes when the ILP is good? And in the worst case scenario it would be performing like it had 224 SP’s ?
I guess that’s hard to say but it’s also noteworhty that they didn’t increase the on-chip memory resources accordingly either.
And another thing, double (64 bit floats) is only on 1 of each 3 blocks of cores. Does this impact programming (that is, do you have to include code to allow for this), or does the cuda runtime automatically take care of it?
Together with the artificial crippling of the 64 bit engine, anything needing 64 bit floating math will probably run better on the CPU - or will have to be converted to fixed point maths - assuming one doesn’t have a tesla unit lurking out of earshot.
Probably, since the imaginary Wikipedia specs for the as yet unreleased Sandy Bridge CPUs put the DP rate for all cores combined at 128 GFLOPS if you use Intel’s new AVX instructions.
That’s what I thought. I’m not trying to be a paid NVIDIA shill (paid NVIDIA, yes, shill, no), but between oodles of memory bandwidth versus CPUs (if you’re not able to just stream from the cache on a CPU) and higher peak DP performance it seems disingenuous to claim that no matter what you’ll be better off running your DP calculations on the CPU.
I’m having serious problems with my new GTX 460. Its seriously underperforming - look at the device to device bandwidth figures below. Shouldn’t these be up around 115GB/s? I’m using Windows XP Pro 64-bit and CUDA 2.3. Any suggestions?
That’s true of the 100 range, but in 104 range, only 1/3rd of the cores can do DP floating point, so I can’t see how it could better than a theoretical peak of around 25 GFlops of double precision