nbody --compare fails with 5000 bodies & 8800 card

I am very sorry if this is old & known and/or if I failed to search sufficiently in my researching this failure; so bare with me :-).

The card is a GeForce 8800 GT running on a Linux machine with kernel version 2.6.9-42.0.3.ELsmp; it’s a dual AMD Opteron. The issue comes when I am running the nbody SDK program in --compare mode and varying the number of bodies in the system. See the following to get the gist of the failure I’m seeing (power of 2 works, power of 2 + 1 fails):

[ release]$ ./nbody --compare --n=4096
Run “nbody -benchmark [-n=]” to measure perfomance.

Test PASSED
[ release]$ ./nbody --compare --n=4097
Run “nbody -benchmark [-n=]” to measure perfomance.

Test FAILED
[ release]$ ./nbody --compare --n=8192
Run “nbody -benchmark [-n=]” to measure perfomance.

Test PASSED
[ release]$ ./nbody --compare --n=8193
Run “nbody -benchmark [-n=]” to measure perfomance.

Test FAILED
[ release]$

I have just not been able to search & find any posts relating to this failure and am concerned that my setup might be incorrect. Also, if my setup is correct, then am curious if there is error is related to the internal storage format of floats on 8800 versus my Opteron chip.

Again, I apologize if this is known – would love to be enlightened though.

Must apologize (once more) for not fully reading some of the nbody code. I realize that there is a tolerance value that can be assigned to the GPU / CPU float comparison… which seems to indicate internal storage of single-precision float is the cause. Thus, perhaps my concern about it failing is ill-visioned.