I’m implementing a simple N-Body simulation using DX11 & Compute Shader, running on GTX 280, driver version 306.97. Theory behind is based on this article:http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html
I also noticed that such simulation is already a part of MS DX SDK (nBodyGravityCS11), where I took some inspiration.
The problem I encountered:
void body_body_interaction(inout float3 ai, float4 bi, float4 bj)
{
float3 r = bj.xyz - bi.xyz;
float distSqr = dot(r, r);
distSqr += g_softeningFactorSq;
float distInvCube = 1.0f / sqrt(distSqr * distSqr * distSqr);
//ai += g_FG * bj.w * distInvCube * r; - NOT WORKING
ai += g_FG *g_fParticleMass * distInvCube * r; //WORKS, g_fParticleMass can be either in cbuffer or global constant, both work
}
Variable bj (xyz - position, w - mass) is at first loaded to shared memory, then GroupMemoryBarrierWithGroupSync() is called to sync group.
[loop]
for(uint block=0; block< num_blocks; ++block)
{
//Fetch positions to shared cache
sh_Positions[indexGroup] = oldPar[block * BLOCK_SIZE + indexGroup].pos;
GroupMemoryBarrierWithGroupSync();
[unroll]
for(uint i = 0; i<BLOCK_SIZE; i+=8)
{
body_body_interaction(accel, myParticle.pos, sh_Positions[i]);
body_body_interaction(accel, myParticle.pos, sh_Positions[i+1]);
body_body_interaction(accel, myParticle.pos, sh_Positions[i+2]);
body_body_interaction(accel, myParticle.pos, sh_Positions[i+3]);
body_body_interaction(accel, myParticle.pos, sh_Positions[i+4]);
body_body_interaction(accel, myParticle.pos, sh_Positions[i+5]);
body_body_interaction(accel, myParticle.pos, sh_Positions[i+6]);
body_body_interaction(accel, myParticle.pos, sh_Positions[i+7]);
}
GroupMemoryBarrierWithGroupSync();
}
If I use mass stored in bj.w, I end up with NaNs (examined using nSight). Any other way works, using global constant or cbuffer.
Funy thing about this is that if I do the same thing in MS demo, the result is very same - I get no output and buffer contains NaNs. Why am I unable to use 4th vector component from a shared memory in this case?? It is initialized properly on CPU side and the copied to GPU.
Full shader code here:http://pastebin.com/SJhs8ntt
Thank You very much