Error of compiling torch.utils.cpp_extension with cuBLAS

Hi, All

I recently encountered a problem when building a customized PyTorch operator using torch.utils.cpp_extension following this tutorial. In my kernel, I call cuBLAS_v2 for GEMM operations. While it can compile without errors. When I import the operator in Python, it crashed, giving me the following message.

test_cuda.cpython-37m-x86_64-linux-gnu.so: undefined symbol: __cudaRegisterLinkedBinary_50_tmpxft_0000403a_00000000_7_test_cuda_kernel_cpp1_ii_7487cc74

However, when I remove all cuBLAS related code in my kernel and recompile it, it can successfully run without any error.

Here are the options I use in my setup file.

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
name=‘rpw’,
ext_modules=[
CUDAExtension(
name=‘rpw_cuda’,
sources=[‘rpw_cuda.cpp’, ‘rpw_cuda_kernel.cu’],
extra_compile_args={‘cxx’: [‘-O3’],
# ‘nvcc’: [‘-lcublas’, ‘-arch=sm_86’, ‘-lcuda’, ‘-lcudart’, ‘-lcudadevrt’]}
‘nvcc’: [‘-rdc=true’, ‘-lcublas’, ‘-lcuda’, ‘-lcublas’ ‘-lcublas_device’ ‘-lcudadevrt’]
}
)
],
cmdclass={
‘build_ext’: BuildExtension
})

Thanks!