I run my program written in CUDA and under the platform of MSVS 2008.
The program was able to run but the result was obvious wrong.
Then I built it using EmuDebug mode and run the program.
The program seems stuck in Deadlock.
I used 255 threads and when i set the breakpoint and run step by step, I found all threads can run sequentially and successfully load the data.
All finished threads will wait in __syncthreads(); However, when the last thread reached that sentence, the program hung up.
I am not sure what happened and can do nothing more.
What I am trying to do in the following code fragment is to copy data from global memory to the shared memory.
And I also found that the data inside vertices array will be changed, say vertices[8], but in different runs it was changed by different thread.
Can anyone who are experienced can enlightened me out?
Thank you very much in advance.
int threadId = threadIdx.x;
// local vertices_size from GPU global memory to register
int vcnt = (*devVertices_size);
// shared memory declaration
extern __shared__ int array[];
int* vertices = (int*)&array;
int* adjIndex = (int*)&vertices[48];
int* adjcent = (int*)&adjIndex[96];
int* triangleTable = (int*)&adjcent[255];
int* cmNbrs = (int*)&triangleTable[255];
int* sortedCS = (int*)&cmNbrs[48];
int* unsortedCS = (int*)&sortedCS[48];
int* eta = (int*)&unsortedCS[48];
int* zeta = (int*)&eta[255];
// Load data from GPU global memory to on-chip shared memory
if (threadId < vcnt) {
vertices[threadId] = devVerticesArray[threadId];
adjIndex[2*threadId] = devAdjIndexArrayInOneGrid[2*threadId];
adjIndex[2*threadId+1] = devAdjIndexArrayInOneGrid[2*threadId+1];
}
adjcent[threadId] = devAdjcentArrayInOneGrid[threadId];
triangleTable[threadId] = devLocal_TT[threadId];
__syncthreads();