Can anyone help check these codes? I always got error if I uncomment the two lines that are commented now. The codes is very long so I post just the part that has problem.
double sum_w = 0, r, g, b, w;
double *src, *tag, *Ir, *Ig, *Ib; //there are pointers pointing to memory blocks in device memory.
int sRy, sRy_c;
…
for(int x=0; x<wx; x++)
{
double *_s, *_t;
_s = src;
_t = tag;
for(int y=0; y<wy; y++)
{
double val = 0;
for(int i = 0; i < nChannel; i++)
{
val += abs(*_s++ - *_t++);
}
r = Ir[y] - mean_Ir;
g = Ig[y] - mean_Ig;
b = Ib[y] - mean_Ib;
w = ar * r + ag * g + ab * b + 1;
// sum_w += w;
// dist += w * val;
}
src += sRy_c;
tag += sRy_c;
Ir += sRy;
Ig += sRy;
Ib += sRy;
}
BTW, I’ve checked that all the parameters passed into the kernel are correct.
Hope to have some comments and suggestions. Thanks!
Thanks for the hint… I never properly checked the error message until you mentioned that. The error message I got is this: cudaErrorLaunchOutOfResources…
Previously I knew this part of codes has problem because the result is different from my c++ codes.
Now I’m checking the possible reason for this error :).
I solved this problem by reducing the block size from 32 to 16, but I’m having new problems :(. I got a CUDA error that says " all CUDA-capable devices are busy or unavailable". Do you know whether there is any way that I can reset CUDA device?
Is this Windows or Linux? On Linux, you might try unloading the nvidia module (rmmod nvidia), but on Windows I have no idea. Rebooting the computer is probably the only guaranteed way to fix a stuck driver.
Also, how many registers is your kernel using?? (pass --ptxas-options=-v argument to nvcc) If you can only launch 16 threads per block, the GPU will be idle most of the time.
I tried to reduce the resolution of the images I’m processing using this code to half size, and the codes works fine. I think it could be that I’m passing in too many variables. I’m trying to debugging the cuda codes first using small-size images.