I have the same number of __syncthreads(); on both A, B pathes
The reason I use __syncthreads(); is because of reuse of shared memory by all threads in block.
do you have any idea why it fails in “tensor GPU” ?
Do you have any solution?
I am running this code also on Pascal GPU on my PC (GTX 1080 )and it works fine
Thanks
Oren
Thanks for the answer
I understand from your answer that syncthread() is not allowed in conditional code, if the condition is not the same for all threads in the block.
Do you have a solution for syncing only the threads that go through the A path (in the conditional code)?
the thread in B path are not using the shared memory and they can continue running.