Nvlink "multiple definition of half2float" error

kh40tika · April 8, 2021, 5:08am

I’m writing a operator extension library for MXNet. Here’s the build error:

nvlink error   : Multiple definition of '_Z17__half2float_warpRVK6__half' in 'lib/givens-matmul.cu.o', first defined in 'lib/example-op.cu.o' (target: sm_61)
nvlink fatal   : merge_elf failed (target: sm_61)

And here’s my build script (with minor censorship):

#!/bin/bash
# TODO.refac replace this with cmake when codebase is large enough
MXNET_HOME=/XXX/incubator-mxnet
ANACONDA_HOME=/XXX/anaconda3
CUDA_HOME=/usr/local/cuda
INCLUDE_DIRS="-I$MXNET_HOME/include -I$ANACONDA_HOME/include -I$CUDA_HOME/include"
LIB_DIRS="-L$MXNET_HOME/build"
LIBS="-lmxnet"
NV_CFLAGS='-std=c++14 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_86,code=sm_86' # 1080Ti & 3090 for now
CFLAGS='-std=c++14 -mf16c -mfma -mavx2 -O3'
LFLAGS='--no-undefined'
OUTPUT_NAME='libmxkhaotik.so'

rm -f lib/*.o
for ifile in src/*.cc; do
  g++ -c $CFLAGS $INCLUDE_DIRS -o lib/$(basename $ifile).o -fPIC $ifile
done
for ifile in src/*.cu; do
  nvcc -dc -c $NV_CFLAGS -Xcompiler "$CFLAGS -fPIC" $INCLUDE_DIRS -o lib/$(basename $ifile).o $ifile
done
nvcc -dlink -shared $NV_CFLAGS -Xlinker "$LFLAGS" $LIB_DIRS -o lib/"$OUTPUT_NAME" lib/*.o $LIBS && rm lib/*.o

I did some research and it appears half2float is a library function. In fact I’m not even using it directly in my code. It’s likely included from certain headers. So I dunno how this multiple definition thing come into play, and thereafter how to fix this problem.

njuffa · April 8, 2021, 5:42am

The demangled prototype for this is __half2float_warp(__half const volatile&), which does not look like a CUDA built-in library function to me. It is probably something from the MXnet code base. It might be instructive to see where and how it is defined.

Have you looked at givens-matmul.cu and example-op.cu? Do both files have to be included in the build? The name of the latter file possibly suggests that it might actually be an example app that got included inadvertently, and should not be part of the library?

kh40tika · April 8, 2021, 5:49am

Okay I’ll investigate MXNet code base more. I wrote the example-op as boilerplate code, and shall be included in test codes. Here’s a gist for example-op.

I still can’t find this half2float string in the MXNet code base other than RTC module (which are irrelevant strings for JIT compilation)

njuffa · April 8, 2021, 5:53am

I don’t know how relevant this is, but a quick internet search points me here:

https://mxnet.apache.org/versions/1.6/api/cpp/docs/api/half_8h_source.html

I couldn’t figure out how to get back from that to the actual sub-directory and filename. But __half2float_warp() definitely seems to be part of MXnet somehow.

kh40tika · April 8, 2021, 7:02am

I fixed the problem by adding inline here. I had overlooked the code base under 3rdparty/ directory.

Thanks @njuffa for the help.