We are currently using CUDA for testing the intersection of geometric
3D primitives and have implemented a number of algorithms for the case of
triangle-triangle intersection tests.
A particular algorithm appears to be suffering from aggresive optimization
by the Open64/nvopencc compiler (using the default -O3 settings).
Through inspection of mixed PTX/C-code (generated with: --opencc-options
-LIST:source=on) it can be seen that some if-else statements are
no longer present in the listing. This has also been verified runtime as the
branch do not work correctly although all input values are correct before
and after the branch.
In CUDA 2.0 this can be remidied by adding a ‘volatile’ modifier to the
local variables that the if-else statements depends on. The algorithm now
In CUDA 1.1 (which also supports the volatile modifier) this does not fix the
Is this expected behaviour and is using the volatile modifier a good solution?
Are there any other pitfalls we should be aware of?