Random, occasional "unknown error" after kernel

So here’s my setup:

Ubuntu 9.04

Linux64 260.19.14 driver

Cuda 2.3 (Yes, that old version. I’m building this to be easily used by many users without upgrading.)

Here’s the code (under the “redc” branch"), with one exception. I was getting errors acting like the computation wasn’t complete when get_factors_found in appcu.cu returned, and when USE_BOINC is #defined. It turns out running a second cudaGetLastError() immediately after a first returns nothing; so my old code was eating an error (more than it was supposed to). It also seemed like the memcpy might be premature, so I removed the event, used a cudaStreamSync(stream) instead, and moved the memcpy to the end. Now I have this:

void get_factors_found(unsigned char *factor_found, const unsigned int cthread_count, const uint64_t start_t, int *check_ns_delay) {

  // Get d_factor_found, into the thread'th factor_found array.

#ifdef USE_BOINC

  cudaError_t err;

  int count = 0;

#endif

if(!blocking_sync_ok) {

    // Manually sleep-wait for the result.

    if(*check_ns_delay <= max_ns_delay) {

      cudaSleepMemcpyFromTime(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost, check_ns_delay, check_ns_overlap, start_t);

    } else {

      // Pass in zero seconds to wait, and ignore the result passed out.

      int i=0;

      cudaSleepMemcpyFromTime(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost, &i, check_ns_overlap, start_t);

    }

  } else {

    cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);

  }

  //cudaStreamSynchronize(stream);

#ifdef USE_BOINC

  err = cudaGetLastError();

  while(err != cudaSuccess) {

    fprintf(stderr, "Warning: A kernel failed with error %s.  Retry %d.\n", cudaGetErrorString(err), count+1);

    // Retry the Memcpy first.

    cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);

    err = cudaGetLastError();

    if(err == cudaSuccess) break;

    fprintf(stderr, "Warning: A kernel still failed with error %s.  Retry %d.\n", cudaGetErrorString(err), count+1);

    // Retry the computation.

    check_ns(NULL, cthread_count, 0);

    count++;

    cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);

    // If this is the last try, don't check the result here; that seems to eat it!

    if(count == 10) break;

    err = cudaGetLastError();

  }

#endif

  checkCUDAErr("getting factors found");

}

My test case is “./tpsieve-cuda-boinc-x86_64-linux -p242070e9 -P242072e9 -k 1201 -K 9999 -N 3000000 -c 60 -m 64 -M 2”.

What’s strange is that, very occasionally, once every few seconds to a minute on my GTX 460, cudaMemcpy reports an “unknown error”. I tried sticking a cudaThreadSynchronize() at the beginning of this function, and it also reported an “unknown error”. You can see the code I added here to check again for that “unknown error”. About 95% of the time, if I do a second cudaMemcpy, the “unknown error” doesn’t reoccur. The other 5% of the time I re-run the kernels in check_ns(), and after that the “unknown error” doesn’t reoccur. Though I suspect a third cudaMemcpy alone would do the job.

I have checkCUDAErr calls (basically similar to that “safe” macro nobody’s supposed to use but everyone does) after every CUDA call and kernel launch, and none of them error. I launch about 3,000 iterations of one kernel, if that’s relevant.

So, anyone know why these errors are popping up? It seems like the results I get back after waiting out the errors are correct. Is that likely true in general?

Errors don’t pop up when the code runs inside a pthread. I really have no idea why that could be.

One other thing that might be of interest: I set up “blocking sync” at the start of the program, so it doesn’t use much CPU. Here’s how I do it:

bool SetCUDABlockingSync(int device) {

  cudaError_t status = cudaGetLastError();

if(status != cudaSuccess) return false;

status = cudaSetDevice(device);

  if(status != cudaSuccess) return false;

status = cudaSetDeviceFlags(cudaDeviceBlockingSync);

  if(status != cudaSuccess) return false;

return true;

}

I hope that isn’t causing the problem.

Hello.

I have the same problem when using BOINC libraries, as I see, when you use

status = cudaSetDeviceFlags(cudaDeviceBlockingSync);

you can use cuCtxSynchronize() after kernel, hope this solve your problem.

Thanks, but cuCtxSynchronize() is a driver API call. I’m trying to stick with C for CUDA only.

Do you know whether cudaThreadSynchronize() is the equivalent C for CUDA function? I tried that, and it returned the same kind of “unknown error”. Did you check to see whether cuCtxSynchronize() returned an error? I actually did try ignoring the “unknown error” message; that gave me validation errors later on, as if the memcpy didn’t copy anything, or copied an incomplete result or something.

Edit: By the way, this only seems to happen in Linux 260.19.* drivers; not Windows 260.99.* drivers.

You’re right, my fault. cuCtxSynchronize returned error (I didn’t check it properly and I saw several validation errors).

On the other hand, this errors don’t happen on Linux 195.36.15 drivers, neither in Linux 260.19.* with cuCtxCreate( &hcuContext, 0x01, hcuDevice ); i.e, CU_CTX_SCHED_AUTO, but happen with cuCtxCreate( &hcuContext, 0x04, hcuDevice ); i.e CU_CTX_BLOCKING_SYNC.

¿a bug?..I don’t know.

i have the same problem , i am using gtx 460 too with cuda 4.0 , did you know what was the problem , it happens for me only when getting “larger” data from device to host , don’t know i doubt it’s a driver issue :S

Hey I also have a GTX 460 and am receiving a similar error. My cudaGetLastError() checks throughout the program return differently depending on when I run it…