Compile error: skipping incompatible /usr/local/cuda/lib64

Hello,

Compiling C++ code and am seeing the following errors. The code continues to compile and will run, but I don’t think it is running on the GPU.

/usr/bin/ld: skipping incompatible /usr/local/cuda/lib64/…/lib/libcudart.so when searching for -lcudart
/usr/bin/ld: skipping incompatible /usr/local/cuda/lib/…/lib/libcudart.so when searching for -lcudart

Some stats:
OS: Ubuntu 12.04
Cuda Dirver 4.2
Device: GeForce GTX 260
Using: Cmake

Cuda files ARE located in /usr/local/cuda

Any suggestions?

Thank You!

Try using ‘ldd myexe’ to see where the linker eventually found libcudart.

I’m curious why the linker skipped both the 64 and 32 bit versions of the library. Those error message usually indicate that it found files with the right name, but wrong bitness.

Also, are you using the FindCUDA functionality of CMake? This usually puts the full path to the library instead of using -L/path -llibrary form which can lead to issues such as this.

I am using findcuda in cmake.

Output of ldd is:

linux-vdso.so.1 =>  (0x00007fff7ab6f000)
libcudart.so.4 => /usr/local/cuda/lib64/libcudart.so.4 (0x00007fabec443000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fabec125000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fabebe2a000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fabebc14000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fabeb857000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fabeb639000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fabeb435000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fabeb22d000)
/lib64/ld-linux-x86-64.so.2 (0x00007fabec6a3000)

OK, take a look at your link line (make VERBOSE=1). Likely someone is adding the -lcudart somewhere where they shouldn’t. Also search for any target_link_libraries( cudart) in your CMakeLists.txt files. You should be using target_link_libraries( ${CUDA_CUDART_LIBRARY}) instead. FindCUDA’s cuda_add_executable and cuda_add_library do this.

I do see CMake attempting to link to cudart with a test file The errors in question appear right after that line. However, I don’t have any test files

/usr/bin/c++ CMakeFiles/test.dir/./test_generated_datasource.cu.o CMakeFiles/test.dir/./test_generated_better_clr.cu.o CMakeFiles/test.dir/./test_generated_CCD.cu.o -o test -rdynamic -lcudart -lcuda
/usr/bin/ld: skipping incompatible /usr/local/cuda/lib64/…/lib/libcudart.so when searching for -lcudart
/usr/bin/ld: skipping incompatible /usr/local/cuda/lib/…/lib/libcudart.so when searching for -lcudart

Another clue might be the warning below. I don’t have any “double” sized variables anywhere in my code.

Building NVCC (Device) object better_clr/CMakeFiles/test.dir//./test_generated_CCD.cu.o
ptxas /tmp/tmpxft_00005993_00000000-2_CCD.ptx, line 7286; warning : Double is not supported. Demoting to float

Perhaps there is some issue with CMake attempting to build a test when I don’t specify one.

There is no mention of cudart in my CMakeLists.txt files. (They are very sparse - just a few lines each.) I am not using “cuda_add_library”. Is that necessary?

Thanks!

Somebody somewhere is adding a target called test. You should look for anything called add_executable(test in all your CMakeLists.txt files.

find . -name CMakeLists.txt | xargs grep test

It is not necessary to use cuda_add_library. That is only a convenience function.

The error about doubles is probably related to the default sm version that is targeted by NVCC (sm_10) which doesn’t support doubles and hence the error. You should look at the documentation for nvcc to see how to specify the different architectures. Something like this should work for you:

list(APPEND CUDA_NVCC_FLAGS
-gencode=arch=compute_20,code=\"sm_20\"
)

Make sure this is called before you do any cuda stuff, but after the find_package(CUDA) call.

My executable in this case is called “test”.

I just tried renaming it to something else, and the same errors persist.

However,

set(CUDA_NVCC_FLAGS -arch compute_20)

Did eliminate the error about doubles. So that’s a nice help!

I just noticed something odd.

Look at the path that ld is complaining about. It is in the correct lib64 directory, but then has a path up and back down into the 32 bit directory

/usr/bin/ld: skipping incompatible /usr/local/cuda/lib64/…/lib/libcudart.so when searching for -lcudart

This is clearly the problem as it is trying to link the 32 bit file on a 64 bit system.

I can’t find any reference to this in my CMakeLists.txt. So, the question is, “Where is this path derived from”.

Looking at the verbose output of make, I don’t see any library path defined when the linking command is made:

/usr/bin/c++ CMakeFiles/clr.dir/./clr_generated_datasource.cu.o CMakeFiles/clr.dir/./clr_generated_better_clr.cu.o CMakeFiles/clr.dir/./clr_generated_CCD.cu.o -o clr -rdynamic -lcudart -lcuda

Thoughts?

That is kind of strange.

If your CMakeLists.txt file isn’t too large, could you attach it or paste it in here?

There’s something subtle going on, because the way the makefile gcc builds should be compile the compile like should look similar to this (slight differences on my mac). Note the full paths to the cuda libraries.

/usr/bin/c++ -Wl,-search_paths_first -Wl,-headerpad_max_install_names CMakeFiles/test.dir/main.cc.o CMakeFiles/test.dir/./test_generated_test_bin.cu.o -o test.app/Contents/MacOS/test /usr/local/cuda/lib/libcudart.dylib -Wl,-rpath -Wl,/usr/local/cuda/lib /usr/local/cuda/lib/libcuda.dylib

There are two CMake files (One in the top directory, and one in the “code” directory.

cmake_minimum_required(VERSION 2.6)

project(clr)

INCLUDE(FindCUDA)

list(APPEND CUDA_NVCC_FLAGS -arch compute_11)

add_definitions(-DCUDA)

set(CUDA_64_BIT_DEVICE_CODE ON)

set(CMAKE_MODULE_PATH ${CMAKE_SOURCE_DIR}/CMake ${CMAKE_MODULE_PATH})

find_package(CUDA)

add_subdirectory(${CMAKE_SOURCE_DIR}/better_clr)

Inside the code directory

include_directories(. ${CMAKE_SOURCE_DIR}/better_clr)

cuda_add_executable(clr datasource.cu better_clr.cu CCD.cu)

Are you using the most recent version of CMake (I think it is 2.8.8)?

Do you have your own version of FindCUDA in your /CMake directory?

If you look in your CMake cache (via the GUI, ccmake or within CMakeCache.txt), are the values of these variables full paths?
CUDA_CUDA_LIBRARY
CUDA_CUDART_LIBRARY

This is what I get with 2.8.8 on Linux:

CUDA_CUDART_LIBRARY /usr/local/cuda/lib64/libcudart.so
CUDA_CUDA_LIBRARY /usr/lib/libcuda.so

/usr/bin/c++ CMakeFiles/test.dir/main.cc.o CMakeFiles/test.dir/./test_generated_test_bin.cu.o -o test -rdynamic /usr/local/cuda/lib64/libcudart.so -lcuda -Wl,-rpath,/usr/local/cuda/lib64

So it looks as though CMake is removing the full path to libcuda.so since it is in the default lib path.

Have you modified any system library paths?

I am using the most recent version of CMake.

I don’t have my own FindCUDA

The CUDA_CUDA_LIBRARY AND CUDA_CUDART_LIBRARY variables match your exactly. (I’m also running Linux)

What I did find was that the ld_library path was being set in my .bashrc file. It appears to contain the correct path. However, deleting these lines from my .bashrc resulted in no more compilation errors!!

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

export LIBRARY_PATH=/usr/local/cuda/lib64:$LIBRARY_PATH

HOWEVER - The code does not run any faster. It is painfully slow, and I suspect that it is still not running on the GPU. Is there some kind of test or monitoring tool that I can use? The nvidia-smi reports that my card isn’t compatible, so that tool won’t work for this. (Card is a GeForce GTX 260 with CUDA Capability 1.3)

Thanks again for the help. Seems like we’re almost there!

Well, if it’s not running on the GPU I would have expected your app to give you an error. CUDA has no “fallback” path when things go wrong. The app itself would have to handle this situation. Typically if a CUDA runtime app failed to find a device or the proper driver, it returns an error.

You should take a look at cuda-gdb. This debugger might be able to give you information regarding if your code is running on the device. If you want to use this, you should add -G to the NVCC flags (see nvcc --help).

Also, your performance is largely based on what you implemented. There are all sorts of things that could result in bad performance such as not exposing enough parallelism or inhibiting parallelism though too many synchronizations or poor use of atomic instructions. Other problems could stem from needless copies of data between the host and device.

I just had a thought.

My second video card recently failed. (It is a low end card that I use just for the video monitor.) So, my monitor is plugged directly into my main GPU and I am running xwindows. Might that be a reason that there are performance issues?

Thanks!

That could affect performance a little, but probably not as much as you might be experiencing.

At this point, I would try the CUDA SDK and see if you can get the samples to run properly. If they run, but are slow, I would suggest starting a new thread in order to give focus to your new problem.