Hi,
maybe I’m doing something wrong, but this is too strange. Please have a look at this kernel:
(SimplexState is an enum with ~8 values)
__global__ void
doRCEStepsD(float *simplices,
float *pointA,
float *pointB,
unsigned int *_max,
SimplexState *simplexState,
unsigned int *activeSimplices,
unsigned int *pNumActiveSimplices,
unsigned int constantsCount,
unsigned int log2ThreadsPerSimplex,
unsigned int log2SimplicesPerBlock)
{
unsigned int locIndex = threadIdx.x;
unsigned int simplexSize = IMUL((constantsCount + 1u), constantsCount);
//load number of simplices that have to be considered by this kernel
/* this works !
__shared__ unsigned int numActiveSimplices;
if (locIndex == 0) {
numActiveSimplices = (*pNumActiveSimplices);
}
__syncthreads();
*/
// this doesn't work !
unsigned int numActiveSimplices;
numActiveSimplices = (*pNumActiveSimplices);
__syncthreads();
// ...
}
“pNumActiveSimplices” is allocated with cudaMalloc, and has the size of one single “unsigned int”.
If only one thread loads it into a shared variable, all is ok…but if each thread of every block has to read it, I get an error
“cutilCheckMsg cudaThreadSynchronize error, line 1260 : unknown error”.
Does anyone have a clue why the first version works, but the second doesn’t…?
I’m using CUDA 2.2 2.3 (Windows Vista, GTX 280).
EDIT: Sorry, I forgot they already installed version 2.3