Hi,
I have made a simple MATLAB program, and then used GPU Coder to convert it into Cuda code so that I can run it on my Jetson TX1. But when compiling i get this error:
ptxas fatal : Unresolved extern function ‘_Z22mwGetGlobalThreadIndexv’
And if i compile it with -rdc=true i get this error:
nvlink error : Undefined reference to ‘_Z22mwGetGlobalThreadIndexv’ in ‘/tmp/tmpxft_00001e26_00000000-17_Test.o’
I’m facing the same problem, did you find a solution to this problem?
One possibility is that it requires device runtime but cannot find the library. In my case it is /usr/local/cuda/lib64/libcudadevrt.a
Thanks for the hint drobysh83, but I managed to overcome this error with a separate compilation of my classes and then linking them all together. It let me find the real problem in my ported code, as well as made the compilation fast (now it takes seconds compared to previous 40min compilation), and it was giving a wrong and random output before, for which I have no clue what it may be the reason (strange that it compiled at all). Overall, I have an impression that CUDA compiler has a long way to evolve and fix bugs, because the references from one class to the other is somewhat problematic.
I solved this problem.
From c++ - How to properly link cuda header file with device functions? - Stack Overflow , I got hint.
I use visual studio 2015. If you use visual studio, follow my direction.
project propertiy → Configuration properties → CUDA C/C++ → Common → Generate Relocatable Device Code → Yes(-rdc=true)
The point is that you need to add the nvcc compile option “-dc” or “-rdc”.
Read this document https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-specifying-compilation-phase-device-c
Hi.
VS2019 (up to date)
Since I have device source code split in 2 .cu files I use this option:
Project Property > Configuration Properties > CUDA C/C++ > Common > Generate Relocatable Device Code > Yes (-rdc=true)
Nevertheless I get the error from initial post (MSB3721 error code 255).
*.cuh and *.cu files are properly included.
Any ideas what could be wrong?
I have encountered a similar issue where using the nvcc option “–relocatable-device-code=true” resolved the problem. However, the resulting program was twice as slow, making it an impractical solution for my needs.