Getting thread context in device emulation Finding the return address of a function


I would like to find out the return address for a device function (for use only in device emulation), and I was wondering if someone knew how to do that. I’m using linux.

The following structure illustrates my point:-

__device__ inline void helperfunc1() 


      printf("This function was called from address %x\n", [return address function] );


__global__ void kernel(arguments)


// code


// code


I tried using __builtin_return_address(1), which is a gcc function (and works on my system).

On trying to compile my code (using make emu=1), I get the following error:-

obj/emurelease/genop.cu_o: In function `void gpu_genop<50u, 0u>(float, float*, float*, unsigned int)’: undefined reference to `__cuda_return_address’ undefined reference to `__cuda_return_address’

collect2: ld returned 1 exit status

make: *** [out/emurelease/genop] Error 1

What is the function __cuda_return_address() ? Can someone explain how to make it work?

My actual problem lies in uniquely identifying (out of potentially many locations) where a function was called in a thread. I could do if I had the instruction pointer for the return.

Thanks for the help.