Invalid Device Function

I’m getting this error when launching a kernel. I have successfully compiled and executed sample programs (scalarProd), so I think I’m just doing something wrong. But before getting into too many details, will I get this error if I had the following because the kernel hasn’t had time to launch?:


	CUT_CHECK_ERROR("Kernel failed");

I understand that a memcpy from the device back to the host will wait for all threads to complete, is the appropriate time to check for kernel status after a memcpy back?

How are you compiling your test program? I had some problems with compiling from within Visual Studio, so I created a Makefile and compiled from the console and everything worked fine…

I’m using Visual Studio Express 2005, using Build Project. My project was created using the CUDAWinApp template/wizard. It compiles OK and runs fine if I’m only doing memcpy stuff, once I try to launch the kernel and do the CUT_CHECK_ERROR it tells me kernel failed. The project was originally a “hello world” project with all code in “main”, and it ran ok (the kernal created the “hello world” string). I left the main routine in place and just added my code by adding other routines, later I removed all of the code from “main” in an attempt to eliminate variables with this problem but it did not change anything. I assume that with a DLL the main routine is not ever executed?? (never created a DLL before).

Here are my other details, I didn’t post them originally because I don’t even know if it’s valid to perform CUT_CHECK_ERROR immediately after kernel launch or not.

WinXP (new box, assume SP2, but it’s not in front of me so can’t say for sure)

GTX280 (drivers downloaded 2 weeks ago)

Java 1.6.0_10

Visual Studio Express 2005, C++ (downloaded 2 weeks ago)

Java loads C++ DLL

Calls native routines using JNI

All tests between Java and C++ code work properly, can xfer data, perform calcs, return results, results match the same code in Java

Java call to C++ routine to allocate memory on device works, that is to say CUDA_SAFE_CALL doesn’t spit out any errors

Subsequent Java call to C++ to launch kernel fails with the “invalid device function”

At this point I’ve eliminated almost all of the code in the kernel and in the routine that launches the kernel, this is what they look like (source is at home, keying from memory, but I eliminated everything but what is shown):

__global__ void testGPUKernel() {

	__shared__ int a;



extern "C" JNIEXPORT __declspec(dllexport) jint JNICALL Java_TestGPUCalls_testNtvGPUCalc(JNIEnv *, jobject) {

	jint result;




	return result;	


Exactly. I tried to use the same wizard and had the same problem as you describe. I’m not sure if its the compiler settings that that wizard creates, but for some reasons the kernel functions fail.

I know this is not a solution, but it can help you determine the cause a bit easier. Here’s a Makefile I use in one of my projects:


CC=“C:\Program Files\Microsoft Visual Studio 8\VC\bin\cl.exe” /EHsc


VSBIN=“C:\Program Files\Microsoft Visual Studio 8\VC\bin”

LIBS=“C:\CUDA\lib\cudart.lib” “C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\lib\cutil32.lib” user32.lib

LIBDIR=“C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\lib”


INCDIR=“C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\inc”

all: main.exe

main.exe: main.obj gpu.obj

$(CC) $(LIBS) main.obj gpu.obj

main.obj: main.cpp

$(CC) main.cpp -c


$(NVCC) -ccbin $(VSBIN) -I$(INCDIR) -c -o gpu.obj


del *.obj


You might also want have to setup the system variables prior to compiling. The batch file that does this is in “C:\Program Files\Microsoft Visual Studio 8\VC\vcvarsall.bat” on my system.

I also attatched another simple project I made for testing. (164 KB)

Thanks, I’ll try it tonight. I’m a newb when it comes to DLL’s, and pretty rusty on C (it’s been a long time), do you know what I need to do to your makefile to instruct it to:

  1. Create a DLL

  2. Name the final executable myprojectname.dll

A follow up in case anyone else runs into this problem also.

danijel, thanks for your help, I used the makefile and your test code and it compiled and executed fine. Then I used your makefile to compile my code and it is now running just fine, no “invalid device function” anymore. I think your right, there is something with the CUDAWinApp project wizard that was causing the problem because I’m also able to build and execute the sample projects from NVIDIA through VS2005 without a problem, only the wizard app was having a problem.

I was able to stumble my way through passing CL options through NVCC to be able to create my DLL. What I don’t understand is why MS documentation says /LD creates a DLL but that gave me a link error, so I looked at the command line in VS and it showed /DLL, I tried passing that and it worked.

By default, for using debug you must set project -> property -> CUDA -> Output -> Intern mode: set Real <= Very important
sorry about it.
i set the default value for emudebug but not true device~~
so get this error~~ :">
i will change the default value in the next version soon.

thanks for your information…

Ok, thanks, I’ll give it another try.

Good advice! Selecting the “real” mode gets rid of the error for me as well in Visual Studio.

I had the same error, turned out that I was compiling for architecture SM_50 while my GPU only supports SM_35.