Frustrating error in simple program Am I doing something wrong?

I’m writing a CUDA program, and it is mysteriously failing. I started removing code in an attempt to isolate the problem, but now that I have a ‘minimal’ example, it is even more mind-boggling.

The program below reports a cuda error every time it’s run, but I don’t see what could possibly be wrong. Removing arbitrary pieces or inserting cout’s mysteriously makes the error go away. It does almost nothing - it calls a kernel in a loop with a couple of arguments and checks cudaGetLastError.

[codebox]

#include <stdio.h>

void check_last_error() {

cudaError_t err = cudaGetLastError();

if (err != cudaSuccess)

fprintf(stderr, "Error: %s\n", cudaGetErrorString(err));

}

global void some_kernel(int* a, int b) {}

int *a, b;

void do_something() {

cudaMalloc((void**)&a, 8);

b = 0;

for (int i=0;i<10;i++) {

check_last_error();

some_kernel<<<1, 1>>>(a, b);

check_last_error();

}

}

int main() {

do_something();

}

[/codebox]

After the first iteration, cudaGetLastError reports “invalid argument”. This makes no sense to me and is causing me problems (in the non-minified example where the kernel actually does stuff).

Am I doing something wrong? Is it a problem with my system configuration?

Any help would be appreciated!

Compiling with “nvcc a.cu -o a.out -O3” (the problem seems to go away without O3; I don’t know if this indicates I’m doing something unsafe, or there is a compiler issue).

System information:

CUDA 2.3

Ubuntu 9.10 (I’ve tested on 9.04 as well, same result)

g++ 4.3.4

NVIDIA GTX 275

Driver version is 190.18

CPU is corei7 920

Have you tried restarting/reloading the driver? If you earlier wrote out of bounds it can break future launches in undefined ways.

I have indeed, and it does not make a difference :(.

I am a newb here so this suggestion probably won’t help but… kernel launches are asynchronous right? So you’re effectively trying to launch 10 copies of the same kernel in parallel (albeit a kernel which doesn’t do anything). If you stick a cudaThreadSynchronize() into the loop, does that get rid of the error?

For what it’s worth, your code works fine on my rig, which has similar hardware but different OS:

CUDA 2.3

Windows XP SP3

Visual Studio Express 2008 SP1

NVIDIA GTX 285

CPU is corei7 920

Hope this helps.

Cheers

Raffles

PS I modded the code slightly so that I could see the output before visual studio closed the window. I can’t see that it would change the behaviour, but here’s the code just in case (in Windows sleep is in the Windows.h file, has a capital at the start and takes an argument in millisecs - you learn a new thing every day!):

[codebox]#include <stdio.h>

#include <windows.h>

void check_last_error() {

cudaError_t err = cudaGetLastError();

if (err != cudaSuccess)

	fprintf(stderr, "Error: %s\n", cudaGetErrorString(err));

else

	fprintf(stdout, "OK: %s\n", cudaGetErrorString(err));

}

global void some_kernel(int* a, int B) {}

int *a, b;

void do_something() {

cudaMalloc((void**)&a, 8);

b = 0;

for (int i=0;i<1000;i++) {

	check_last_error();

	some_kernel<<<1, 1>>>(a, B);

	check_last_error();

}

}

int main() {

do_something();

Sleep(9999);

}[/codebox]

Maybe try initialising the cuda before any other call? (cudaInit)
Usually cuda gets initialised with the first cuda call, but what if something went wrong at that point?

Does cudaThreadSynchronize() help?