Random, occasional "unknown error" after kernel

Ken_g6 · December 4, 2010, 9:23pm

So here’s my setup:

Ubuntu 9.04

Linux64 260.19.14 driver

Cuda 2.3 (Yes, that old version. I’m building this to be easily used by many users without upgrading.)

Here’s the code (under the “redc” branch"), with one exception. I was getting errors acting like the computation wasn’t complete when get_factors_found in appcu.cu returned, and when USE_BOINC is #defined. It turns out running a second cudaGetLastError() immediately after a first returns nothing; so my old code was eating an error (more than it was supposed to). It also seemed like the memcpy might be premature, so I removed the event, used a cudaStreamSync(stream) instead, and moved the memcpy to the end. Now I have this:

void get_factors_found(unsigned char *factor_found, const unsigned int cthread_count, const uint64_t start_t, int *check_ns_delay) {

  // Get d_factor_found, into the thread'th factor_found array.

#ifdef USE_BOINC

  cudaError_t err;

  int count = 0;

#endif

if(!blocking_sync_ok) {

    // Manually sleep-wait for the result.

    if(*check_ns_delay <= max_ns_delay) {

      cudaSleepMemcpyFromTime(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost, check_ns_delay, check_ns_overlap, start_t);

    } else {

      // Pass in zero seconds to wait, and ignore the result passed out.

      int i=0;

      cudaSleepMemcpyFromTime(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost, &i, check_ns_overlap, start_t);

    }

  } else {

    cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);

  }

  //cudaStreamSynchronize(stream);

#ifdef USE_BOINC

  err = cudaGetLastError();

  while(err != cudaSuccess) {

    fprintf(stderr, "Warning: A kernel failed with error %s.  Retry %d.\n", cudaGetErrorString(err), count+1);

    // Retry the Memcpy first.

    cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);

    err = cudaGetLastError();

    if(err == cudaSuccess) break;

    fprintf(stderr, "Warning: A kernel still failed with error %s.  Retry %d.\n", cudaGetErrorString(err), count+1);

    // Retry the computation.

    check_ns(NULL, cthread_count, 0);

    count++;

    cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);

    // If this is the last try, don't check the result here; that seems to eat it!

    if(count == 10) break;

    err = cudaGetLastError();

  }

#endif

  checkCUDAErr("getting factors found");

}

My test case is “./tpsieve-cuda-boinc-x86_64-linux -p242070e9 -P242072e9 -k 1201 -K 9999 -N 3000000 -c 60 -m 64 -M 2”.

What’s strange is that, very occasionally, once every few seconds to a minute on my GTX 460, cudaMemcpy reports an “unknown error”. I tried sticking a cudaThreadSynchronize() at the beginning of this function, and it also reported an “unknown error”. You can see the code I added here to check again for that “unknown error”. About 95% of the time, if I do a second cudaMemcpy, the “unknown error” doesn’t reoccur. The other 5% of the time I re-run the kernels in check_ns(), and after that the “unknown error” doesn’t reoccur. Though I suspect a third cudaMemcpy alone would do the job.

I have checkCUDAErr calls (basically similar to that “safe” macro nobody’s supposed to use but everyone does) after every CUDA call and kernel launch, and none of them error. I launch about 3,000 iterations of one kernel, if that’s relevant.

So, anyone know why these errors are popping up? It seems like the results I get back after waiting out the errors are correct. Is that likely true in general?

Errors don’t pop up when the code runs inside a pthread. I really have no idea why that could be.

One other thing that might be of interest: I set up “blocking sync” at the start of the program, so it doesn’t use much CPU. Here’s how I do it:

bool SetCUDABlockingSync(int device) {

  cudaError_t status = cudaGetLastError();

if(status != cudaSuccess) return false;

status = cudaSetDevice(device);

  if(status != cudaSuccess) return false;

status = cudaSetDeviceFlags(cudaDeviceBlockingSync);

  if(status != cudaSuccess) return false;

return true;

}

I hope that isn’t causing the problem.

frasanz · December 13, 2010, 8:49pm

Hello.

I have the same problem when using BOINC libraries, as I see, when you use

status = cudaSetDeviceFlags(cudaDeviceBlockingSync);

you can use cuCtxSynchronize() after kernel, hope this solve your problem.

So here’s my setup:

Ubuntu 9.04

Linux64 260.19.14 driver

Cuda 2.3 (Yes, that old version. I’m building this to be easily used by many users without upgrading.)

Here’s the code (under the “redc” branch"), with one exception. I was getting errors acting like the computation wasn’t complete when get_factors_found in appcu.cu returned, and when USE_BOINC is #defined. It turns out running a second cudaGetLastError() immediately after a first returns nothing; so my old code was eating an error (more than it was supposed to). It also seemed like the memcpy might be premature, so I removed the event, used a cudaStreamSync(stream) instead, and moved the memcpy to the end. Now I have this:
void get_factors_found(unsigned char *factor_found, const unsigned int cthread_count, const uint64_t start_t, int *check_ns_delay) {

  // Get d_factor_found, into the thread'th factor_found array.

#ifdef USE_BOINC

  cudaError_t err;

  int count = 0;

#endif

if(!blocking_sync_ok) {

    // Manually sleep-wait for the result.

    if(*check_ns_delay <= max_ns_delay) {

      cudaSleepMemcpyFromTime(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost, check_ns_delay, check_ns_overlap, start_t);

    } else {

      // Pass in zero seconds to wait, and ignore the result passed out.

      int i=0;

      cudaSleepMemcpyFromTime(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost, &i, check_ns_overlap, start_t);

    }

  } else {

    cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);

  }

  //cudaStreamSynchronize(stream);

#ifdef USE_BOINC

  err = cudaGetLastError();

  while(err != cudaSuccess) {

    fprintf(stderr, "Warning: A kernel failed with error %s.  Retry %d.\n", cudaGetErrorString(err), count+1);

    // Retry the Memcpy first.

    cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);

    err = cudaGetLastError();

    if(err == cudaSuccess) break;

    fprintf(stderr, "Warning: A kernel still failed with error %s.  Retry %d.\n", cudaGetErrorString(err), count+1);

    // Retry the computation.

    check_ns(NULL, cthread_count, 0);

    count++;

    cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);

    // If this is the last try, don't check the result here; that seems to eat it!

    if(count == 10) break;

    err = cudaGetLastError();

  }

#endif

  checkCUDAErr("getting factors found");

}
My test case is “./tpsieve-cuda-boinc-x86_64-linux -p242070e9 -P242072e9 -k 1201 -K 9999 -N 3000000 -c 60 -m 64 -M 2”.

What’s strange is that, very occasionally, once every few seconds to a minute on my GTX 460, cudaMemcpy reports an “unknown error”. I tried sticking a cudaThreadSynchronize() at the beginning of this function, and it also reported an “unknown error”. You can see the code I added here to check again for that “unknown error”. About 95% of the time, if I do a second cudaMemcpy, the “unknown error” doesn’t reoccur. The other 5% of the time I re-run the kernels in check_ns(), and after that the “unknown error” doesn’t reoccur. Though I suspect a third cudaMemcpy alone would do the job.

I have checkCUDAErr calls (basically similar to that “safe” macro nobody’s supposed to use but everyone does) after every CUDA call and kernel launch, and none of them error. I launch about 3,000 iterations of one kernel, if that’s relevant.

So, anyone know why these errors are popping up? It seems like the results I get back after waiting out the errors are correct. Is that likely true in general?

Errors don’t pop up when the code runs inside a pthread. I really have no idea why that could be.

One other thing that might be of interest: I set up “blocking sync” at the start of the program, so it doesn’t use much CPU. Here’s how I do it:
bool SetCUDABlockingSync(int device) {

  cudaError_t status = cudaGetLastError();

if(status != cudaSuccess) return false;

status = cudaSetDevice(device);

  if(status != cudaSuccess) return false;

status = cudaSetDeviceFlags(cudaDeviceBlockingSync);

  if(status != cudaSuccess) return false;

return true;

}
I hope that isn’t causing the problem.

Ken_g6 · December 15, 2010, 5:53am

Thanks, but cuCtxSynchronize() is a driver API call. I’m trying to stick with C for CUDA only.

Do you know whether cudaThreadSynchronize() is the equivalent C for CUDA function? I tried that, and it returned the same kind of “unknown error”. Did you check to see whether cuCtxSynchronize() returned an error? I actually did try ignoring the “unknown error” message; that gave me validation errors later on, as if the memcpy didn’t copy anything, or copied an incomplete result or something.

Edit: By the way, this only seems to happen in Linux 260.19.* drivers; not Windows 260.99.* drivers.

frasanz · December 15, 2010, 7:44pm

You’re right, my fault. cuCtxSynchronize returned error (I didn’t check it properly and I saw several validation errors).

On the other hand, this errors don’t happen on Linux 195.36.15 drivers, neither in Linux 260.19.* with cuCtxCreate( &hcuContext, 0x01, hcuDevice ); i.e, CU_CTX_SCHED_AUTO, but happen with cuCtxCreate( &hcuContext, 0x04, hcuDevice ); i.e CU_CTX_BLOCKING_SYNC.

Â¿a bug?..I don’t know.

sun.wael · March 27, 2011, 11:04pm

i have the same problem , i am using gtx 460 too with cuda 4.0 , did you know what was the problem , it happens for me only when getting “larger” data from device to host , don’t know i doubt it’s a driver issue :S

sepia.latimanus · July 30, 2011, 12:35am

Hey I also have a GTX 460 and am receiving a similar error. My cudaGetLastError() checks throughout the program return differently depending on when I run it…

Topic		Replies	Views
cudaDeviceSynchronize needed between kernel launch and cudaMemcpy ? CUDA Programming and Performance	15	16288	September 29, 2017
strange behavior with device emulation CUDA Programming and Performance	5	2693	May 20, 2008
Synchronization hangs sporadically after kernel launch CUDA Programming and Performance	23	7826	August 20, 2015
Odd error fixed by commenting unrelated line? CUDA Programming and Performance	11	8620	February 17, 2010
cudaMemcpyAsync CUDA Programming and Performance	10	20834	October 16, 2015
How to debug kernel throwing an exception? CUDA Programming and Performance	16	7941	June 14, 2013
cudaSynchronizeDevice() returns error code 6 CUDA Programming and Performance	7	8601	June 16, 2011
Passing dynamically allocated memory in kernel to sub kernel via dynamic parallelism CUDA Programming and Performance	6	691	May 3, 2019
CudaAPI calls in functions, compiler/linking bug? CUDA Programming and Performance	6	367	August 17, 2023
Potential Bug, cuda-memcheck can someone verify? Program crashing on GPU initialisation with cuda-me CUDA Programming and Performance	11	3462	April 24, 2020

Random, occasional "unknown error" after kernel

Related topics