So here’s my setup:
Ubuntu 9.04
Linux64 260.19.14 driver
Cuda 2.3 (Yes, that old version. I’m building this to be easily used by many users without upgrading.)
Here’s the code (under the “redc” branch"), with one exception. I was getting errors acting like the computation wasn’t complete when get_factors_found in appcu.cu returned, and when USE_BOINC is #defined. It turns out running a second cudaGetLastError() immediately after a first returns nothing; so my old code was eating an error (more than it was supposed to). It also seemed like the memcpy might be premature, so I removed the event, used a cudaStreamSync(stream) instead, and moved the memcpy to the end. Now I have this:
void get_factors_found(unsigned char *factor_found, const unsigned int cthread_count, const uint64_t start_t, int *check_ns_delay) {
// Get d_factor_found, into the thread'th factor_found array.
#ifdef USE_BOINC
cudaError_t err;
int count = 0;
#endif
if(!blocking_sync_ok) {
// Manually sleep-wait for the result.
if(*check_ns_delay <= max_ns_delay) {
cudaSleepMemcpyFromTime(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost, check_ns_delay, check_ns_overlap, start_t);
} else {
// Pass in zero seconds to wait, and ignore the result passed out.
int i=0;
cudaSleepMemcpyFromTime(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost, &i, check_ns_overlap, start_t);
}
} else {
cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);
}
//cudaStreamSynchronize(stream);
#ifdef USE_BOINC
err = cudaGetLastError();
while(err != cudaSuccess) {
fprintf(stderr, "Warning: A kernel failed with error %s. Retry %d.\n", cudaGetErrorString(err), count+1);
// Retry the Memcpy first.
cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);
err = cudaGetLastError();
if(err == cudaSuccess) break;
fprintf(stderr, "Warning: A kernel still failed with error %s. Retry %d.\n", cudaGetErrorString(err), count+1);
// Retry the computation.
check_ns(NULL, cthread_count, 0);
count++;
cudaMemcpy(factor_found, d_factor_found, cthread_count*sizeof(unsigned char), cudaMemcpyDeviceToHost);
// If this is the last try, don't check the result here; that seems to eat it!
if(count == 10) break;
err = cudaGetLastError();
}
#endif
checkCUDAErr("getting factors found");
}
My test case is “./tpsieve-cuda-boinc-x86_64-linux -p242070e9 -P242072e9 -k 1201 -K 9999 -N 3000000 -c 60 -m 64 -M 2”.
What’s strange is that, very occasionally, once every few seconds to a minute on my GTX 460, cudaMemcpy reports an “unknown error”. I tried sticking a cudaThreadSynchronize() at the beginning of this function, and it also reported an “unknown error”. You can see the code I added here to check again for that “unknown error”. About 95% of the time, if I do a second cudaMemcpy, the “unknown error” doesn’t reoccur. The other 5% of the time I re-run the kernels in check_ns(), and after that the “unknown error” doesn’t reoccur. Though I suspect a third cudaMemcpy alone would do the job.
I have checkCUDAErr calls (basically similar to that “safe” macro nobody’s supposed to use but everyone does) after every CUDA call and kernel launch, and none of them error. I launch about 3,000 iterations of one kernel, if that’s relevant.
So, anyone know why these errors are popping up? It seems like the results I get back after waiting out the errors are correct. Is that likely true in general?
Errors don’t pop up when the code runs inside a pthread. I really have no idea why that could be.
One other thing that might be of interest: I set up “blocking sync” at the start of the program, so it doesn’t use much CPU. Here’s how I do it:
bool SetCUDABlockingSync(int device) {
cudaError_t status = cudaGetLastError();
if(status != cudaSuccess) return false;
status = cudaSetDevice(device);
if(status != cudaSuccess) return false;
status = cudaSetDeviceFlags(cudaDeviceBlockingSync);
if(status != cudaSuccess) return false;
return true;
}
I hope that isn’t causing the problem.