I have a strange problem where all advice currently on the forums doesn’t seem to help. From reading other threads, what people are saying is that if I get an invalid device kernel error code from trying to run a kernel, it’s because there is some sort of settings or arch/code/gencode issue. However, I took the same exact code from a 9.1 environment, recompiled it in a clean 10.0 and 10.1 environment with the same exact hardware, the same working cmake build files, and observe cudart::cudaApiLaunchKernel (by setting a breakpoint in cuda-gdb and observing return codes) returning the status code for an invalid device kernel.
I did the following:
- Used cuobjdump to check whether my static library generated supports the right architecture/ptx versions. It does (based on previous forum posts.)
- Used cuda-gdb to validate that I could disassemble the ‘func’ parameter sent into the launch kernel; the code was present
- I observed that the thrust library was not able to get function attributes on even an empty kernel
- Verified that the libraries listed in /proc//maps correspond to the libraries installed in /usr/local/cuda (symlinked to 10.0 or 10.1)
- Affixed extern “C” to global functions to rule out any name-mangling related causes for kernel functions
I have tried gcc 6.x, 7.3, and 7.4, and simply cannot get the kernel to launch under 10.0 or 10.1, with the right drivers running, and where deviceQuery works. If I purge everything and drop back to 9.1, everything works again. No compile flag changes are required. CMake 3.10 is used to build the cuda objects, using 1st class CMake support – no find_package on CUDA involved.
Can someone throw some hints my way on the causes or dependencies involved in getting invalid device kernel error messages? Is there some other dependency interfering with resolving the kernel functions? Any tips at all would be helpful.
(I should add that my environment is Ubuntu 18.04; I purged all dependencies related to cuda/nvidia/libnvidia and then used the cuda-repo debs to update and install cuda.)
The exact error code enum that you are receiving might be of interest.
I have seen certain other particular cases result in an error possibly like this. For example, if the image involves (usually large) static allocations (e.g. device) that can’t be satisfied at load time, you may get an error like this. I’m not saying this fits your general description, however.
Is there any chance you can test this on a fresh Linux installation, such as an external disk, that has only the 10.x toolkit installed?
Any chance libraries are being mixed when the program runs, such as 9.1 libs having precedence in LD_LIBRARY_PATH over 10.x? Something in this direction, possibly?
I do what you say, for example, write code for 9.1, which compiles and runs in Linux and Windows, and just as well in MacOS with 10.0. But these OS installations never saw any other CUDA version in their lives, so there is no chance for lib conflict.
The error code enum was 0x62 (98) invalid device kernel function.
Regarding Saulo’s response: Are there maybe some common file names or .so files I can hunt for in case maybe a 9.1 purge didn’t quite get rid of things? Better yet, is there some symbolic breakpoint for a function that validates whether the device kernel is ok that I can break on to figure out if I am landing in some wrong shared binary? I tried to rule this out already, but I can look once again.
I put a breakpoint on my system where the kernel launches. Then, I grabbed a list of the shared libraries in the process space under cuda-gdb. Like this:
Thread 1 "TestGPUClasses" hit Breakpoint 2, 0x000055555561d070 in cudart::cudaApiLaunchKernel(void const*, dim3, dim3, void**, unsigned long, CUstream_st*\
) ()
(cuda-gdb) sharedlibrary
(cuda-gdb) info sharedlibrary
From To Syms Read Shared Object Library
0x00007ffff7dd5f10 0x00007ffff7df4b20 Yes /lib64/ld-linux-x86-64.so.2
0x00007ffff7bd1830 0x00007ffff7bd2911 Yes /usr/lib/x86_64-linux-gnu/libboost_system.so.1.65.1
0x00007ffff79b6bb0 0x00007ffff79c50f1 Yes /lib/x86_64-linux-gnu/libpthread.so.0
0x00007ffff77ab200 0x00007ffff77ae70c Yes /lib/x86_64-linux-gnu/librt.so.1
0x00007ffff75a5e50 0x00007ffff75a6bde Yes /lib/x86_64-linux-gnu/libdl.so.2
0x00007ffff72a83f0 0x00007ffff7357b1e Yes /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00007ffff7006ac0 0x00007ffff701732d Yes /lib/x86_64-linux-gnu/libgcc_s.so.1
0x00007ffff6c342d0 0x00007ffff6dacc3c Yes /lib/x86_64-linux-gnu/libc.so.6
0x00007ffff6880a80 0x00007ffff693f2f5 Yes /lib/x86_64-linux-gnu/libm.so.6
0x00007ffff57e8a40 0x00007ffff5add988 Yes /usr/lib/x86_64-linux-gnu/libcuda.so.1
0x00007ffff54d6b80 0x00007ffff55066c8 Yes /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.39
Now, I go inspect the cuda.so files:
jackson@toad:/usr/lib/x86_64-linux-gnu$ ls -l libcuda.so.1
lrwxrwxrwx 1 root root 17 Feb 25 17:44 libcuda.so.1 -> libcuda.so.418.39
jackson@toad:/usr/lib/x86_64-linux-gnu$ ls -l libnvidia-fatbinaryloader.so.418.39
-rw-r--r-- 1 root root 298696 Feb 9 20:11 libnvidia-fatbinaryloader.so.418.39
Are these not the right libraries? Is there a chance that some header file is messing with the code? The problem is, if I look for cuda.h or fat binary.h or some other cuda 10 related file, I see no other headers or possibility for weird inclusions.
Are there any other functions or diagnostics I can apply here to figure out why I am getting an invalid device kernel? C++ standards differences? I can validate the compile flags and provide them here if they’d help. What I need is some kind of reason for why the device kernel is invalid and the API is just not giving me something useful.
Just following my current hypothesis that there could be some mix, please print your PATH and LD_LIBRARY_PATH environment variables and see if the CUDA version X bin and its corresponding library path are in the same order. For example:
- PATH: blabla:/cuda-9.1/bin:/cuda-10/bin
- LD_LIBRARY_PATH: blabla:/lib/cuda-10/libs:/lib/cuda-9.1/libs
Notice that CUDA 9.1 bin comes before CUDA 10, so whatever program (such as nvcc) is called will be checked in CUDA 9.1 first. But the libraries that are checked first are those from CUDA 10.
I don’t really know if it is your particular case or even if the program would compile at all, but in case it does, then you can see that cudart and/or other libs that don’t match the version of nvcc might be linked/referenced.
Maybe there are other ways to find the problem, but I don’t particularly know since I avoid having multiple versions of a toolkit (SDK, compiler, IDE…) in the same OS like the plague.
Saulo: No such path conflicts exist (I double checked), but I appreciate your response.
Also, it does seem like the right libraries are in the debugger. I’m hoping one of these nVidia engineers gives me some more surgical advice on diagnosing the problem. I just need to know the precise point of failure here on why the kernel isn’t considered valid.
Edit: See next post.
[s]I think I’ve figured out what is going on, but I need some kind of solution still.
Say I tag the function: extern “C” global Foo( some args here ), when I look inside this function to see what device function is called, for whatever reason, the device stub equivalent is still name mangled. Is this a bug? It’s possible there is a CMake issue or some sort of host compiler resolution issue, but I’m not finding the proper device stub for an extern “C” function that is being generated.
What I know is that, inside the CUDA API, some getEntryFunction is failing. It is having problems resolving the device stub function, I think, and then when I tagged the global function extern “C”, the device stub somehow still requires the mangled version. If I cuobjdump the device stub, the non-mangled version still does not appear.[/s]
Disregard the last post. It’s not the device stub name mangling that is the problem. The issue is with cmake and the nvcc link process when cmake_device_link.o is built. What’s happening is that the .o file doesn’t actually contain the symbols for the device function when I build using cmake with cuda10. When I look at the cmake output, for some unknown reason, it’s not putting the static library on the link line when it’s building cmake_device_link.o. I need to figure out what it is thinking. The actual toolchain, from what I can tell, in CUDA 10, is fine. I just have some weird quirk that is manifesting itself in my cmake build process – I think the bug is on the cmake 1st-class-cuda-support.
My only complaint to Nvidia is that the error messages along the way should be a lot more descriptive. None of the posts about ‘invalid device function’ help me diagnose what the API level is having difficulties with. It would have been helpful to know the invalid device function was a symbol table issue (assuming I am right – there’s a chance I was wrong like in the previous post.)