Which library contains the DEVICE versions of the functions
- cudaGetDevice(int * device)
- cudaGetErrorString ( cudaError_t error )
What is the recommended way of finding this answer myself ? (using cuobjdump / or is there any documentation ?)
Second question is a bit more generic but it will be VERY helpful if I get to know the answer
For cudaGetDevice to work on device, we need to use -rdc. So I will be generating relocatable device code, but in a previous question of mine in this forum, an example of cudaGetDevice (int* device) was given and the library that was linked was libcudadevrt.a , now if this function is defined in this library (I am not sure though), this is a static library why do we particularly need relocatable device code, and why do we not need relocatable device code when we are not using the device runtime ?
The library (on linux) is libcudadevrt.a
On a standard linux install, it is in /usr/local/cuda/lib64 along with all the other libraries.
It contains the the cuda device runtime
If you want to use the cuda device runtime, studying any of the CUDA sample codes that do that will show you appropriate project builds (i.e. makefiles on linux, visual studio project files on windows) to link against that library.
You need relocatable device code any time CUDA (device) code in one compilation unit is calling CUDA (device) code in another compilation unit. When you use the cuda device runtime, the call to the library function is in your code, but the library function itself is defined in another compilation unit - the library.
In situations where you don’t need to use -rdc, it means that all the device source code is available to the device code compiler when it is compiling your module/compilation unit. In that case, the compiler can resolve function calls directly (i.e. hardcode them, because they are in the same module) or else inline the code. Such options are not available to the compiler when the compiled code is in another compilation unit. The code can’t be inlined (obviously) and the runtime loader mechanism may not locate the jump target at a fixed offset, meaning hardcoding the jump/call is not possible.