CUDA separable compilation + shared libraries -> "Invalid function" error

carlosgalvezp · September 5, 2021, 2:57pm

Hi,

I’m running into an issue trying to apply CUDA separable compilation to my project, which uses shared libraries. Note: I am aware that device linking cannot be done across shared library boundaries. That’s not what I’m doing. My shared libraries have a regular C/C++ interface.

After some experimentation, I believe the issue is: I’m performing device-link twice (one for each library), and the same symbol exists in both device linked objects. Why is this a problem?

I have asked the question in StackOverflow, I’ll reproduce it here.

I’m trying to use CUDA separable compilation in my project. The project is composed of a binary that depends on a few shared libraries (all built in the same build system). These shared libraries in turn use common CUDA code. When running the binary, I get a segfault similar to here. When I create a minimal example, I get “invalid device function” error instead. If I turn the shared libraries into static libraries, the error goes away . Unfortunately I don’t have control over this and need to make it work with shared libraries.

I have seen a couple similar posts here in SO, but they use CMake and the solutions usually involve changing libraries from shared to static, which I can’t do in my project. I have double-checked that I’m running the code on the right GPU (and indeed it works if I do some changes, see below), so that’s not the issue.

I believe I’m missing something when doing CUDA separable compilation, device linking or creating shared libraries.

Below is a fully reproducible minimal example of the problem:

// common.h
#ifndef COMMON_H
#define COMMON_H

__device__ int common();

#endif

// common.cu
#include "common.h"

__device__ int common()
{
    return 123;
}

// a.h
#ifndef A_H
#define A_H

__attribute__((__visibility__("default")))
void runA();

#endif

// a.cu
#include "a.h"

#include <cstdio>
#include <iostream>

#include "common.h"

__global__ void kernelA()
{
    printf("Running A: %d\n", 456 + common());
}

void runA()
{
    kernelA<<<1,1>>>();
    std::cout << cudaGetErrorString(cudaPeekAtLastError()) << std::endl;
    cudaDeviceSynchronize();
}

// b.h
#ifndef B_H
#define B_H

__attribute__((__visibility__("default")))
void runB();

#endif

// b.cu
#include "b.h"

#include <cstdio>
#include <iostream>

#include "common.h"

__global__ void kernelB()
{
    printf("Running B: %d\n", 321 + common());
}

void runB()
{
    kernelB<<<1,1>>>();
    std::cout << cudaGetErrorString(cudaPeekAtLastError()) << std::endl;
    cudaDeviceSynchronize();
}

// main.cpp
#include "a.h"
#include "b.h"

int main()
{
    runA();
    runB();
}

So basically a binary depending on 2 shared libraries A and B, both of which utilize the common() device function.

This is my build/test script:

#!/usr/bin/env bash
set -euxo pipefail

CUDA_ROOT=/usr/local/cuda-10.2
NVCC=$CUDA_ROOT/bin/nvcc
CC=/usr/bin/g++
GENCODE="arch=compute_75,code=sm_75"

# Clean previous build
rm -f *.o *.so main

# Compile relocatable CUDA code
$NVCC -gencode=$GENCODE -dc -Xcompiler -fPIC,-fvisibility=hidden common.cu -o common.cu.o
$NVCC -gencode=$GENCODE -dc -Xcompiler -fPIC,-fvisibility=hidden      a.cu -o      a.cu.o
$NVCC -gencode=$GENCODE -dc -Xcompiler -fPIC,-fvisibility=hidden      b.cu -o      b.cu.o

# Build shared library A
$NVCC -gencode=$GENCODE -dlink common.cu.o a.cu.o -o a.dlink.o
$CC -shared common.cu.o a.cu.o a.dlink.o -L$CUDA_ROOT/lib64 -lcudart -o liba.so

# Build shared library B
$NVCC -gencode=$GENCODE -dlink common.cu.o b.cu.o -o b.dlink.o
$CC -shared common.cu.o b.cu.o b.dlink.o -L$CUDA_ROOT/lib64 -lcudart -o libb.so

# Build final executable
$CC main.cpp -L. -la -lb -o main

# Run it
LD_LIBRARY_PATH=. ./main

Running it I get:

invalid device function
invalid device function

After some trial and error, I notice the problem is solved by:

Not linking common.cu.o in either library when device linking (obviously I need to make either library no longer use the common() function.
Making A and B static libraries.
Combining A and B into one single shared library.

Unfortunately I cannot apply these solutions in my project. Why is it a problem to have 2 shared libraries? I’ve read about the “device linker ignoring shared libraries”, but in this case it’s the host linker creating the shared library, not the device linker, so I’m hoping that’s OK?

Thanks!

Yuki_Ni · September 6, 2021, 7:12am

Could you please file us a ticket following the instruction here Getting Help with CUDA NVCC Compiler . You can add this topic link in the description . We will take a look . Thanks.

carlosgalvezp · September 6, 2021, 8:15am

Absolutely, thanks! Just filed the bug, let me know if I can help further.

carlosgalvezp · September 6, 2021, 12:25pm

Browsing the Forums I believe my issue is closest to this one:

If I create only 1 .so file, which links together both a.dlink.o and b.dlink.o, I get “multiple definitions” error. This cannot be caught by the linker if I build 2 independent .so files.

Another thing that I have observed is that, when I run my program and the runtime linker starts to load to my libraries, they call this function twice:

__cudaRegisterLinkedBinary_14_common_cpp1_ii_e5ca4d49

This doesn’t happen with the other device functions - they get called only once. Is this a problem? Registering the same binary twice?

Thanks!

Yuki_Ni · November 11, 2021, 3:27pm

Your original issue is talked internally and here is the info
The problem is the duplicate common.o in both libraries. If you copy common.cu to common2.cu, and then link common2.o in libb, common.o in liba, then it works.

Also here is my other workaround on trying your case

yni@node1:~/yni/CustomerBug/nvbug3373815$ nvcc -arch=sm_75 -dlink common.cu.o a.cu.o b.cu.o -o ab.dlink.o
yni@node1:~/yni/CustomerBug/nvbug3373815$

yni@node1:~/yni/CustomerBug/nvbug3373815$ g++  -shared common.cu.o a.cu.o b.cu.o ab.dlink.o -L /usr/local/cuda-11.5/lib64 -lcuda -lcudart -o libmy.so
yni@node1:~/yni/CustomerBug/nvbug3373815$ 
yni@node1:~/yni/CustomerBug/nvbug3373815$ g++ main.cpp -L . -lmy -o myMain
yni@node1:~/yni/CustomerBug/nvbug3373815$
yni@node1:~/yni/CustomerBug/nvbug3373815$
yni@node1:~/yni/CustomerBug/nvbug3373815$ LD_LIBRARY_PATH=. ./myMain
no error
Running A: 579
no error
Running B: 444

tomilovanatoliy · July 13, 2022, 5:08pm

Would it be fixed? It seems that initialization code should be arranged as singleton, that is not the case currently.

carlosgalvezp · July 14, 2022, 1:04pm

Found a solution: add -fvisibility=hidden to the dlink step:

-$NVCC -gencode=$GENCODE -dlink common.cu.o a.cu.o -o a.dlink.o
+$NVCC -gencode=$GENCODE -dlink -Xcompiler -fvisibility=hidden common.cu.o a.cu.o -o a.dlink.o

-$NVCC -gencode=$GENCODE -dlink common.cu.o b.cu.o -o b.dlink.o
+$NVCC -gencode=$GENCODE -dlink -Xcompiler -fvisibility=hidden common.cu.o b.cu.o -o b.dlink.o

system · July 28, 2022, 1:04pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
nvcc (nvlink) not linking against device code library CUDA Programming and Performance	7	11537	June 20, 2018
Linking device code CUDA Programming and Performance	13	7543	December 8, 2014
Cmake: kernel fail to launch for the separate compilation of binary and library CUDA Programming and Performance	4	306	August 7, 2023
Issue with cudaMemcpyToSymbol and Separable Compilation. CUDA Programming and Performance	10	1362	February 28, 2019
Link failure with Thrust and separate compilation CUDA Programming and Performance	4	2346	January 21, 2018
Linking multiple static cuda libs CUDA Programming and Performance	1	2155	August 26, 2020
How to link host code with a static CUDA library after separable compilation? CUDA Programming and Performance	2	7755	April 30, 2013
Linking Device functions from static libraries with CMake CUDA NVCC Compiler	3	327	January 7, 2025
pxtas can't find symbol, but it's definitely in the object file? CUDA Programming and Performance	2	1035	February 5, 2016
Multiple definition of __cudaRegisterLinkedBinary_* error with relocatable utilities CUDA NVCC Compiler nvcc	0	956	September 11, 2022

CUDA separable compilation + shared libraries -> "Invalid function" error

Related topics