Hello,
I am not even sure what to call this
Initially, I had a few threads read global memory in a kernel function
if (i < some_value)
{
shared[i] = global[y + i];
}
__syncthreads();
the code section read garbage (the shared memory contained garbage after the code section, even though a mouse-over in the debugger perspective shows correct values in the global memory array, with no race conditions present), and so I added this, to aid debugging
if (i < some_value)
{
shared[i] = global[y + i];
}
__syncthreads();
if (i == 0)
{
[read global]
}
__syncthreads();
Now, stepping the section, thread 0 would ‘forget that it is thread0’, and simply jump the added section
I have seen this before with poor synchronization - threads ceasing to execute logic properly, so I again added a section to further test whether the program by now has left its rails:
if (i < some_value)
{
shared[i] = global[y + i];
}
__syncthreads();
if (i == 0)
{
[read global]
}
__syncthreads();
if (i == 0)
{
[few lines of code]
}
__syncthreads();
Amazingly, thread 0 would simply jump the 1st (i == 0), and would subsequently execute the 2nd (i == 0)
The value of i does not change at any point, and mouse-over shows the correct value in i
At the same time, other threads of the same block seems to be in the right place
How is that possible, and what can I test (for)?
I have placed the kernel code in a shared library, but the other kernel functions seemingly execute fine