Link errors with nvtxRangePop and nvtxRangePushEx

I’ve been having this problem for awhile and now had some time to create a small testcase to illustrate it.
Here is a source file (testnv.cpp) that will show the problem:

#include <stdio.h> // for printf
#include <unistd.h> // for usleep
#include <nvToolsExt.h>
const uint32_t colors = { 0x00008800, 0x00000088 };
const int num_colors = sizeof(colors)/sizeof(uint32_t);
void psPushIt(const char* name, int cid)
int color_id = cid;
color_id = color_id%num_colors;
nvtxEventAttributes_t eventAttrib = {0};
eventAttrib.version = NVTX_VERSION;
eventAttrib.colorType = NVTX_COLOR_ARGB;
eventAttrib.color = colors;
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;
eventAttrib.message.ascii = name;
void psPopIt()
int main( int argc, char** argv )
printf(“test 1\n”);
// do real work here
usleep(1000); // pretend work
return 0;

Environment is Dell 64-bit with Ubuntu 14.04.3 with g++ version 4.8.4 and CUDA 7.5.18.
Here is the compile command that fails:
g++ -o testnv -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lnvToolsExt testnv.cpp
/tmp/ccyuqY0W.o: In function ‘psPushIt(char const*, int)’:
testnv.cpp:(.text+0x79): undefined reference to ‘nvtxRangePushEx’
/tmp/ccyuqY0W.o: In function ‘psPopIt()’:
testnv.cpp:(.text+0x84): undefined reference to ‘nvtxRangePop’

If I build this with all the appropriate options for .so file (but not -Wl,-z,defs) the compile/link will succeed but the code will not run without the following env var set:

This problem does not occur with CUDA 7.0.
I tried this on several different machines and it only fails on CUDA 7.5.
Did something change here? How do I make this work?


Your linking order is incorrect.

g++ -o testnv -I/usr/local/cuda/include testnv.cpp -L/usr/local/cuda/lib64 -lnvToolsExt

Ok, that worked.

But why did it work?
Also, what changed after 7.0 that required this specific order of parameters?


The linking order rules are established by the gnu tools. Did you change g++ versions?

Your original compile/link command:

g++ -o testnv -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lnvToolsExt testnv.cpp

works just fine for me on CUDA 7.5.18/Fedora20/g++ 4.8.3