I would like to add some more details:
The alternating between correct and incorrect result occurs for low values of my input parameter n. For higher values of n the problem does not occur.
The incorrect result, each time it occurs, is the same.
I know exactly which threads seem to be behaving differently between the two runs. It was during the process of getting these threads to store away some information (in global) to help me diagnose the problem that the problem disappeared when global memory access is included.
I have checked and rechecked array access and that everything is being initialised. The fact that the problem comes and goes and only occurs in short runs seems to me to indicate that the problem is more than a basic array overrun.
For the purposes of debugging this problem, shared memory is not used and global is only used at the start. “Local” memory is used.
Running what is basically the same code and data structures under C works.