adding kernel printf provokes ptxas fatal : Unresolved extern function '_Z9_bwt_bwt3jP7textureI5ui

I have a nasty feeling this error from nvcc V5.5.0 is caused by the
arch on the command line:
/usr/local/cuda/bin/nvcc --compiler-options -fno-strict-aliasing --compiler-options -fno-inline -gencode arch=compute_35,code="sm_35,compute_35" -m64 -I/usr/local/cuda/include -DUNIX -o linux/release/objs/barracuda.cu.o -c barracuda.cu
ptxas fatal : Unresolved extern function ‘_Z9_bwt_bwt3jP7textureI5uint4Li1EL19cudaTextureReadMode0EE’

I have tried various permutations and tried stealing options from
/usr/local/cuda-5.5/Samples/NVIDIA_CUDA-5.5_Samples/0_Simple/simplePrintf/Makefile
without success.

It seems rediculos that adding debug to the source code should lead
to this sort of confusion.

As always any help or comments would be most welcome.
Thanks in advance
Bill
ps: The problem also seems to be with nvcc version V6.0.1

ps: Error also with nvcc 5.0, V0.2.1221

The error turns out to be very obscure. Nothing to do with printf
but instead due to change in -arch needed to allow printf to be
compiled.

If -arch is high enough to allow printf (sm_20 or more) then it
triggers the compiler oddity. There was some device code
which compiled with sm_13 but was never used. Under sm_20
and higher the compiler creates .ptx which refers to non-existing
code which triggers the error from the PTX assembler.

I have filed a bug report (number 1522836).

Sorry for the trouble

Bill

I can’t say I have ever seen a problem like that just from inserting a printf(). The error message would indicate that a device function called by the kernel could not be found, maybe because the function is question is missing the device attribute, or because the signature at the call site does not match the signature of the prototype and is thus incompatible. Since in C/C++ the default linkage is ‘extern’ the compiler then assumes that the function is defined in a different compilation unit, but since there seems to be only one compilation unit involved here (no separate compilation with linking), the build fails.

Not knowing anything about the details of the code, I would suggest filing a bug since this is also seen with the CUDA 6.0 tool chain. Please attach a minimal, self-contained repro code. Thank you for your help.

I got a very prompt and helpful reply to bug report 1522836 from nVidia.
Essentially the thing to do is either: not to make the error in the first
place (or comment out the error) or when using -arch >= sm_20 use -dc in
place of -c in the nvcc command line.

nVidia have only seen the cut down version of the code which shows the
problem. Nevertheless here is their reply verbatim:

  1. The device function _bwt_bwt3() is only declared but NOT defined.
  2. With device whole program compilation, calls to undefined device function results in error : “Unresolved extern function”
  3. With device separate compilation, it is possible to have a un-defined device function which may be defined in a different compilation unit. But during final linking all used device functions must be resolved and their definitions should be found.

The compilation options is as follows -

  • Option -c to nvcc refers to host compile mode. No -c option refers to host whole program compile mode
  • Option -dc to nvcc refers to device compile mode. Without -dc, nvcc compiles for device whole program compile mode. See http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#examples
  • The above program is compiled with- “host compile-only but device whole program compile mode”.
  • Since the device function _bwt_bwt3() is NOT defined and the file is compiled with DEVICE WHOLE PROGRAM compile mode, the error correctly occurs as mentioned in (2).

Bill