No device code according to cuobj

I have built Gromacs with GPU support and according to the logs, nvcc is used during the build process. For example,

[  7%] Building NVCC (Device) object src/gromacs/gpu_utils/tests/CMakeFiles/gpu_utilstest_cuda.dir/gpu_utilstest_cuda_generated_devicetransfers.cu.o
[ 44%] Building NVCC (Device) object src/gromacs/CMakeFiles/libgromacs.dir/mdlib/nbnxn_cuda/libgromacs_generated_nbnxn_cuda.cu.o
[ 44%] Building NVCC (Device) object src/gromacs/CMakeFiles/libgromacs.dir/mdlib/nbnxn_cuda/libgromacs_generated_nbnxn_cuda_data_mgmt.cu.o
[ 44%] Generating baseversion-gen.cpp
[ 44%] Building NVCC (Device) object src/gromacs/CMakeFiles/libgromacs.dir/mdlib/nbnxn_cuda/libgromacs_generated_nbnxn_cuda_kernel_F_noprune.cu.o

and at the end of make install, the binary is created

-- Installing: /opt/gromacs-2019.2/install/bin/gmx

However, when I use cuobjdump, it says no device code is there in the binary.

# cuobjdump gmx -ptx
cuobjdump info    : File 'gmx' does not contain device code

Is that normal?

it means there is no ptx

try -sass instead of -ptx

explaining why the difference is covered in many places on the web. It is a function of the switches you pass when you compile code (which are buried in your makefile and not evident in the output you have posted).

You can start here:

https://stackoverflow.com/questions/39981981/how-to-check-what-cuda-compute-compatibility-is-the-library-compiled-with/39983472#39983472

and of course its possible that a GPU-accelerated binary only makes use of the GPU via dynamic linking to shared libraries (cudart, cublas, cudnn, etc.), in which case no ptx or sass (i.e. no device code) need be in the executable at all.

Using -sass returns the same output.

and of course its possible that a GPU-accelerated binary only makes use of the GPU via dynamic linking to >shared libraries (cudart, cublas, cudnn, etc.), in which case no ptx or sass (i.e. no device code) need >be in the executable at all.

I think that is the case. I actually want to know which sm numbers are used int he binary.

If you simply call a routine from CUBLAS, for example, it does not matter what sm numbers you compile with. The CUBLAS library is designed to run on any CUDA capable GPU, for which the library is intended to serve, plus any future GPU, subject to various limitations.

If the gromacs build is also (in addition to the gmx executable) building libraries of its own that are called by the gmx executable, then those would need to be investigated in a similar fashion.

I haven’t studied how gromacs uses the GPU carefully, but it appears:

http://manual.gromacs.org/documentation/2019/install-guide/index.html

that some of the GPU acceleration may be done by calling the CUFFT library. In that case (if it only uses CUFFT in some places) you’re not going to find any device code, and the compute/arch switches used to compile the gmx executable should be essentially irrelevant.

After looking at the above link more carefully, however, my guess is that gromacs does build its own libraries, does include its own CUDA device code (not just calls to CUFFT), these libraries are dynamically linked, and these libraries are likely where the CUDA device code may be found.

http://manual.gromacs.org/documentation/2019/install-guide/index.html#cuda-gpu-acceleration
http://manual.gromacs.org/documentation/2019/install-guide/index.html#static-linking

It seems that nvcc config file is https://github.com/gromacs/gromacs/blob/master/cmake/gmxManageNvccConfig.cmake#L101 Which I think it can be controlled by GMX_CUDA_TARGET_SM during the cmake process.

Do you mean, that by creating a static binary, I am able to get the sass code of gmx?

Yes, I think that if you statically linked as much as possible into the gmx binary, you’d be able to use that (with cuobjdump) to discover what device code is there.

But it shouldn’t really require that kind of a rebuild. If you’re able to figure out what libraries are built by the gromacs build process, you should be able to run cuobjdump on those libraries in a similar fashion.

However, depending on what you are looking for exactly, it might be sufficient just to study the cmake configuration as you’ve already indicated/linked.