I found something interesting and I’m curious how it works.
In SDK MonteCarlo sample there are 2 modules
MonteCarlo_SM10.cu_10.o
MonteCarlo_SM13.cu_13.o
both come from the same piece of code but during compilation they are prepared for different architecture and work with different precision (double or single).
For example:
if(useDoublePrecision)
MonteCarlo_SM13(&plan);
else
MonteCarlo_SM10(&plan);
Both object files are linked to single executable called MonteCarlo which contains these symbolics.
000000000040605d l F .text 0000000000000025 _ZL16inverseCNDKernelPfS_j
000000000040620e l F .text 0000000000000025 _ZL16MonteCarloKernelP14__TOptionValuePfi
0000000000405fd5 l F .text 0000000000000088 _ZL39__device_stub__Z16inverseCNDKernelPfS_jPfS_j
00000000007cf460 l O .bss 0000000000000008 _ZZL39__device_stub__Z16inverseCNDKernelPfS_jPfS_jE3__f
0000000000406186 l F .text 0000000000000088 _ZL54__device_stub__Z16MonteCarloKernelP14__TOptionValuePfiP14__TOptionValuePfi
00000000007cf478 l O .bss 0000000000000008 _ZZL54__device_stub__Z16MonteCarloKernelP14__TOptionValuePfiP14__TOptionValuePfiE3__f
0000000000406eed l F .text 0000000000000025 _ZL16inverseCNDKernelPfS_j
000000000040709e l F .text 0000000000000025 _ZL16MonteCarloKernelP14__TOptionValuePfi
0000000000406e65 l F .text 0000000000000088 _ZL39__device_stub__Z16inverseCNDKernelPfS_jPfS_j
00000000007e74a0 l O .bss 0000000000000008 _ZZL39__device_stub__Z16inverseCNDKernelPfS_jPfS_jE3__f
0000000000407016 l F .text 0000000000000088 _ZL54__device_stub__Z16MonteCarloKernelP14__TOptionValuePfiP14__TOptionValuePfi
00000000007e74b8 l O .bss 0000000000000008 _ZZL54__device_stub__Z16MonteCarloKernelP14__TOptionValuePfiP14__TOptionValuePfiE3__f
Executable contains the same function names from different object files.
During execution series of these functions are called implicitly:
__cudaRegisterFatBinary
…
__cudaRegisterFunction
…
cudaConfigureCall
cudaSetupArgument
…
cudaLaunch
and here is my point, according to NVIDIA CUDA Library Documentation 4.1, in function cudaLaunch “the parameter entry must be a device function symbol”.
The question is, based on cudaLaunch argument, how runtime API knows which function should call when the entry is only device function symbol?
I hope someone will satisfy my curiosity.
Thanks