Invalid device function

hi2all

CUDA ref. manual says that return value of cudaLaunch func. is one of the following:
[font=“Courier New”]cudaSuccess
cudaErrorInvalidDeviceFunction
cudaErrorInvalidConfiguration[/font]

my question is What can cause cudaErrorInvalidDeviceFunction as a result of a kernel call?

I wrote a multi-gpu code, which works on every CUDA-capable PC except one.
On “WinXP, 2x GTX 280”-machine program exits with cudaErrorInvalidDeviceFunction?

does anyone of you guys know where can be my problem?

Did you see this other thread: [url=“http://forums.nvidia.com/index.php?showtopic=80370&st=0&p=456174&#entry456174”]http://forums.nvidia.com/index.php?showtop...mp;#entry456174[/url]

“Solved. There was another cuda call initializing a pbo which was wrong and somehow killed cuda. The code up there now works fine.”

I did. So, the reason why I get cudaErrorInvalidDeviceFunction as a result of a kernel call is that device is “busy”. Some other code running a device, or even mine. Right?

Or, this is one of all possible reasons why i can get this error.

That person seems to say that an incorrect call corrupted the GPU’s memory, possibly messing up the kernel’s code or just scrambling the nvidia driver’s state. Possibly, it could be an ordinary out-of-bounds access that is scrambling driver memory. I don’t know if this is what you’re seeing, but overwriting-memory bugs tend to be semi-non-deterministic and may manifest themselves in one configuration but not another.

For sake of completeness, what are those other systems on which your code does work?

cuda version:

2.0

nvcc version:

Built on Wed_Jul_16_12:57:50_PDT_2008

Release 2.0, V0.2.1221

does not work on:

WinXP, 4GB RAM, 2x GTX280 (any mode)

does work on:

Ubuntu, 4GB RAM, 2x GTX280 (NonEmu) (the same machine, but under WinXP there is a bug discussed above)

WinXP, 2 GB RAM, 8600 GTS. (NonEmu, Emu)

WinXP, 4GB, C2Duo 3GHz (Emu)

WinXP, 512MB, Intel 1.5GHz (Emu)

thanks, I’ll check that theory.

(but sdk-samples are ok)

Your code doesn’t work even in emu? Very strange indeed. EDIT: Hold on, driver api supports emulation now?

Also, do you recompile on every machine you run on, or do you have one binary (for the win machines)?

sorry, I was missinformed. in emu-mode everything works fine.

the code is always recompiled on every machine before run.

I also tried to run an empty kernel instead of existing in program by calling it with <<<dim3(1,1,1), dim3(1,1,1)>>> configuration. the same reaction on the FIRST kernel call.

upd: Who told you I am using driver API?

Recompiling is what typically makes out-of-bounds bugs surface or hide. You can try compiling the code on a windows machine where it works, and see if it’ll run on the machine that doesn’t. Although this won’t really tell you much.

Btw, is your windows machine with 4GB using an x64 OS?

Sorry, I thought you’d said you were using cuLaunch.

Hmmm… Not sure if I get it. What do you mean?

Ok, I’ll check it. But I thought the resulting binary code does not depend on a machine it was compiled on (at least under windows-family OS).

Yeap. WinXP x64-version.

And are the machine on which it works x32? If so, this is an important point you should have said from the start. Probably, your error is in dealing with the x64 architecture. E.g., you do sizeof(long) instead of sizeof(void*) somewhere or somesuch. This causes an out-of-bounds access.

What I meant about “surface and hide” is that an out-of-bounds bug will sometimes appear during one compile, and not appear on another. That is how it’s not quite deterministic. Sometimes you may change a completely unrelated line of code, and it will toggle the bug.