I am new to CUDA and have a question which maybe one of you guys can help with.
I am basically doing a tri-tri intersection using CUDA.
In the kernel code, a triangle searches a list of other triangles and gets the closest one
based on centre to centre distance.
Say, the closest distance calculated is updated in the loop to be a float called mgdist.
If I set the results array, say d_close[i] = mgdist, at the end of the loop the code runs really slow.
If I set it to be say the first node of the triangle it points to eg d_close[i]=node0[i]
then it is fine.
Is there anything obvious that rings a bell with anyone ?
Surely the compiler is not that clever that it doesnt bother with the loop if mgdist is not used by a
device result array ??