Why gives the usage with shared memory different output?
Iam on CUDA 2.3, 3.0
thank you in advance
Why gives the usage with shared memory different output?
Iam on CUDA 2.3, 3.0
thank you in advance
__syncthreads();
sdata[threadIdx.x] = v[i*blockDim.x+threadIdx.x];
__syncthreads();
so simple…
Yeah, I’ve forgotten an extra __syncthreads() in a for loop before. Definitely drives you crazy. :)
oh yes the hole week…