sometimes it shows “error = access violation on load (share memory)”, but I don’t use any share memory.
However, when I decrease my input data size, it won’t hint error, why? Is there any possibility that too large data size causes this kind of error?
Here is my kernel function:
__global__ void DefferentialSysMatrixKernel(int* k, int** address, float** length, int* kDiffer, int** addressDiffer, float** lengthDiffer)
{
int x = threadIdx.x + blockDim.x*blockIdx.x;
int y = threadIdx.y + blockDim.y*blockIdx.y;
if ((x < detecterN - 1) && (y < frameN))
{
int rayIndex = y*detecterN + x;
int index = y*(detecterN - 1) + x;
int *addr1 = address[rayIndex];
int *addr2 = address[rayIndex + 1];
int * addr3 = addressDiffer[index];
float *leng1 = length[rayIndex];
float *leng2 = length[rayIndex + 1];
float *leng3 = lengthDiffer[index];
int num1 = k[rayIndex];
int num2 = k[rayIndex + 1];
int t1 = 0;
int t2 = 0;
int t3 = 0;
while ((t1 < num1) && (t2 < num2))
{
if (addr1[t1] == addr2[t2])
{
addr3[t3] = addr1[t1];
leng3[t3] = leng2[t2] - leng1[t1];
t1++;
t2++;
t3++;
}
else if (addr1[t1] < addr2[t2])
{
addr3[t3] = addr1[t1];
leng3[t3] = -leng1[t1];
t1++;
t3++;
}
else
{
addr3[t3] = addr2[t2];
leng3[t3] = leng2[t2];
t2++;
t3++;
}
}
while (t1 < num1)
{
addr3[t3] = addr1[t1];
leng3[t3] = -leng1[t1];
t1++;
t3++;
}
while (t2 < num2)
{
addr3[t3] = addr2[t2];
leng3[t3] = leng2[t2];
t2++;
t3++;
}
kDiffer[index] = t3;
}
}
It is a little complicated. I use it to calculate difference between a set of sparse vectors. Parameters detecterN and frameN are Macro definitions according to input data size.
I check it for many times to make sure that array subscripts not out of bounds.
Thanks for your advice. I have tried to use cuda-memcheck, but it succeed and hint no error, here is the output:
E:\Learning\Bei Hang\lab\statistical iteration reconstruction\code\PCMAP_cuda_modify_2\x64\Debug>cuda-memcheck PCMAP.exe
========= CUDA-MEMCHECK
Read file successfully
System Matrix Loaded successfully!
Weight Loaded successfully!
cpu load time: 0.843s
total iteration time: 117.613min
These output are set in code by myself. So it means no error?
Howerver, I still get error massage when I use cuda debug in VS2015. Why?
Also, you can see it costs almost 2 hours when I use cuda-memcheck. But in VS, when I run the code, not in cuda debug, it can finish in just twenty seconds. What causes this?
Then the parameters int** d_addrLocate and int** d_lengLocate are passed to the parametric parameters int** address and int** locate.
I doubt that when data size(that is numsum) is too big, the memory that pointers int* d_addr and int* d_leng point to are not continues, which might caused double pointers pointed to wrong places.
However, I don’t get any wrong massage when I use cuda-memcheck. Do you know why?