I can’t yet get a very precise description, but here are the main facts:
Code that runs fine on 189-191 has the following problems on 195:
__constant is no longer accepted as kernel-argumentmodifier (might be intentional, of course);
the kernel now freezes the machine after compilation with omitted argument modifiers (see 1)
The code is rather of the same type as nbody, but implements a scheme for symmetric forces, i.e. every combination is computed only once. The scheme requires a bit of workgroup coordination by locking a global var (acceleration) when reults are added.
This is accomplished by atom_or() and atom_xor() at the end of the kernel.
const unsigned block=OPPBLOCK(index); // OPPBLOCK is a macro const unsigned ba=block>>LOCKADDRSHIFT; // allow for several locking values per 32 bit. LOCKADDRSHIFT is typically 2; larger values cause problems. const unsigned bitvalue=1<<(block-(ba<<LOCKADDRSHIFT)); while ((atom_or(&acc_locked[ba],bitvalue)&bitvalue)!=0); // try to lock and proceed when not previously locked accs[block].x+=tmp4.x; accs[block].y+=tmp4.y; accs[block].z+=tmp4.z; atom_xor(&acc_locked[ba],bitvalue); // unlock
Locking proved necessary, omitting it gives somewhat inaccurate results. My guess is that this part of the code causes the freeze on 195.39, but I have no proof as yet, because I do not have access to 195.39 all the time.
So, maybe I’m doing something weird, but then, maybe it is worhwhile to check this one out as a potential bug.
The full code can be downloaded from: http://nbodysim.googlecode.com/svn/branches/triangsimp