say I have a block with 384 threads, and I will use 256 of them to do one task and all the threads to finish up.
[indent]int i = threadIdx.x; shared int a[5]; shared int buffer[256];
… …
if (i < 256)
{
[indent] (do something to buffer)
__syncthreads();
if (i < 5)
a[i] = buffer[i*50];[/indent]
}
__syncthreads();
… …[/indent]
after this code, it seems to me that threads 256 to 383 don’t see the changes made to a. They read a[0] to a[4] 0.The code is too long to post, it will be helpful if any one can give a wild guess what could cause this. Thanks.
say I have a block with 384 threads, and I will use 256 of them to do one task and all the threads to finish up.
[indent]int i = threadIdx.x; shared int a[5]; shared int buffer[256];
… …
if (i < 256)
{
[indent] (do something to buffer)
__syncthreads();
if (i < 5)
a[i] = buffer[i*50];[/indent]
}
__syncthreads();
… …[/indent]
after this code, it seems to me that threads 256 to 383 don’t see the changes made to a. They read a[0] to a[4] 0.The code is too long to post, it will be helpful if any one can give a wild guess what could cause this. Thanks.
The __syncthreads needs to be executed by all threads. So do this instead:
int i = threadIdx.x;
__shared__ int a[5];
__shared__ int buffer[256];
// ... ...
if (i < 256)
{
// (do something to buffer)
}
__syncthreads();
if (i < 256) {
if (i < 5)
a[i] = buffer[i*50];
// ...
}
__syncthreads();
// ... ...
The __syncthreads needs to be executed by all threads. So do this instead:
int i = threadIdx.x;
__shared__ int a[5];
__shared__ int buffer[256];
// ... ...
if (i < 256)
{
// (do something to buffer)
}
__syncthreads();
if (i < 256) {
if (i < 5)
a[i] = buffer[i*50];
// ...
}
__syncthreads();
// ... ...