Inconsitent output in Release Mode

When ever code is run for 512 threads/block , output comes correct most of the times ( 9 out of 10)but some time output is completely wrong.
The same problem is not faced if the number of threads are less than 512 threads / block.

Please suggest me the reason for this abnormal behavior.
thanks in advance.

It’s almost impossible to answer your question without knowning details of your implementation. Please provide some code that reproduces this behaviour.

Sounds like a race condition in your code. Check for different threads accessing a variable and potential problems if the order of those threads is changed.


thank for useful suggestions.
After some permutations and combination, I feel issue was with the number of registers used per thread against available.

Number of registers??? That can only affect your kernel in one way… Either your kernel launches succesffuly or not.

It has got nothing to do with your result. As paulius said – it must be a race condition in your code