I’m writing a CUDA program, and it is mysteriously failing. I started removing code in an attempt to isolate the problem, but now that I have a ‘minimal’ example, it is even more mind-boggling.
The program below reports a cuda error every time it’s run, but I don’t see what could possibly be wrong. Removing arbitrary pieces or inserting cout’s mysteriously makes the error go away. It does almost nothing - it calls a kernel in a loop with a couple of arguments and checks cudaGetLastError.
[codebox]
#include <stdio.h>
void check_last_error() {
cudaError_t err = cudaGetLastError();
if (err != cudaSuccess)
fprintf(stderr, "Error: %s\n", cudaGetErrorString(err));
}
global void some_kernel(int* a, int b) {}
int *a, b;
void do_something() {
cudaMalloc((void**)&a, 8);
b = 0;
for (int i=0;i<10;i++) {
check_last_error();
some_kernel<<<1, 1>>>(a, b);
check_last_error();
}
}
int main() {
do_something();
}
[/codebox]
After the first iteration, cudaGetLastError reports “invalid argument”. This makes no sense to me and is causing me problems (in the non-minified example where the kernel actually does stuff).
Am I doing something wrong? Is it a problem with my system configuration?
Any help would be appreciated!
Compiling with “nvcc a.cu -o a.out -O3” (the problem seems to go away without O3; I don’t know if this indicates I’m doing something unsafe, or there is a compiler issue).
System information:
CUDA 2.3
Ubuntu 9.10 (I’ve tested on 9.04 as well, same result)
g++ 4.3.4
NVIDIA GTX 275
Driver version is 190.18
CPU is corei7 920