invalid device function


I setup my new computer on ubuntu and I get that strange error, invalid device function on a code that works on my other computer.

I don’t think the problem comestible from the installation because the examples from cuda work.

Can someone help me to solve this problem?

I think this has been pretty much answered in your cross-posting:

In fact not…, here is my problem. I made a cu file with a main function inside I compile it:

/usr//bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o Convert.o -c

/usr//bin/nvcc -ccbin g++   -m64      -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o Convert Convert.o

The program works properly but I want to use in cpp function so I made in Qtcreator a project width the follow .pro file:

CONFIG += console
CONFIG -= qt

SOURCES +=   src/main.cpp

CONFIG += link_pkgconfig
PKGCONFIG += opencv

INCLUDEPATH += /usr/local/include
INCLUDEPATH += /usr/local/include/opencv
LIBS += -L/usr/local/lib
LIBS += -L/usr/lib/x86_64-linux-gnu
LIBS += -L/usr/local/share/OpenCV/3rdparty/lib
LIBS += -lm
LIBS += -lopencv_core
LIBS += -lopencv_imgproc
LIBS += -lopencv_highgui
LIBS += -lopencv_objdetect
LIBS += -lopencv_calib3d
LIBS +=  -lGL -lGLU -lX11 -lglut -lGLEW

# CUDA settings <-- may change depending on your system
CUDA_SOURCES += ./cuda/

CUDA_SDK = /usr/lib/nvidia-cuda-toolkit             #/usr/include/   # Path to cuda SDK install
CUDA_DIR = /usr/lib/nvidia-cuda-toolkit             # Path to cuda toolkit install


SYSTEM_NAME = unix         # Depending on your system either 'Win32', 'x64', or 'Win64'
SYSTEM_TYPE = 64           # '32' or '64', depending on your system
CUDA_ARCH = sm_52          # Type of CUDA architecture, for example 'compute_10', 'compute_11', 'sm_10'
NVCC_OPTIONS = #--use_fast_math

# include paths

# library directories
QMAKE_LIBDIR += /usr/lib/x86_64-linux-gnu#/usr/lib/nvidia-cuda-toolkit/lib #/usr/lib/i386-linux-gnu #$CUDA_DIR/lib/


# Add the necessary libraries
CUDA_LIBS = -lcuda -lcudart -lnppi -lnpps

# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $join(INCLUDEPATH,'" -I"','-I"','"')
LIBS += -L /usr/lib/x86_64-linux-gnu -lcuda -lcudart -lnppi -lnpps
NVCC_LIBS =  -lGL -lGLU -lX11 -lglut -lGLEW
    # Release mode
    cuda.input = CUDA_SOURCES
    cuda.output = $CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
    cuda.commands = $CUDA_DIR/bin/nvcc   -dlink $NVCC_OPTIONS $CUDA_INC $NVCC_LIBS    --machine $SYSTEM_TYPE -gencode arch=compute_52,code=sm_52 -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
    cuda.dependency_type = TYPE_C

    cudaLINK.input = CUDA_SOURCES
    cudaLINK.output = $CUDA_OBJECTS_DIR/${TARGET}_cuda.o
    cudaLINK.commands = $CUDA_DIR/bin/nvcc -ccbin g++ -dlink  $NVCC_OPTIONS $CUDA_INC $NVCC_LIBS --machine $SYSTEM_TYPE -gencode arch=compute_52,code=sm_52   Convert_cuda.o -o ${TARGET}_cuda.o


    cuda/Global_var.h \
    cuda/Convert.h \

The program send me :

/BGE/cuda/ : CUDA Runtime API error 8: invalid device function.

The problem is the same program works properly on my other computer. I don’t understand why in the first computer the program compiling tools works fin on one computer and not on the other.

From what is shown above, you seem to build only for am sm_52/compute_52 platform. If the resulting code runs on one machine, but fails to run on a second one with the error message shown, it suggests that the second machine has a GPU with compute capability < 5.2. What GPUs are in your two systems?

In fact I just change that:

-gencode arch=compute_52,code=sm_52

to that:

-gencode arch=compute_52,code=compute_52

An it works.

Thank you for the answer.

It seems to me that this is not the optimal solution. It seems that one of your devices is indeed an sm_52 device, the other is some other architecture. Therefore, the second device cannot execute the sm_52 machine code generated with ‘code=sm_52’. When you switch to ‘code=compute_52’ the generated PTX (which is not machine code) can be JIT compiled on the second device, and the program runs fine. However, JIT compilation creates overhead at application startup, and depending on the details, it could be significant overhead.

The better way to deal with the situation is to find out the architecture(s) of all the GPUs you intend to run on, then have the compiler build what is called a “fat” binary that includes machine code for all the architectures that you want to target. Building fat binaries is a best practice of CUDA programming.