Launching the same function name

I found something interesting and I’m curious how it works.

In SDK MonteCarlo sample there are 2 modules

MonteCarlo_SM10.cu_10.o
MonteCarlo_SM13.cu_13.o

both come from the same piece of code but during compilation they are prepared for different architecture and work with different precision (double or single).
For example:

    if(useDoublePrecision)
        MonteCarlo_SM13(&plan);
    else
        MonteCarlo_SM10(&plan);

Both object files are linked to single executable called MonteCarlo which contains these symbolics.

000000000040605d l F .text 0000000000000025 _ZL16inverseCNDKernelPfS_j
000000000040620e l F .text 0000000000000025 _ZL16MonteCarloKernelP14__TOptionValuePfi
0000000000405fd5 l F .text 0000000000000088 _ZL39__device_stub__Z16inverseCNDKernelPfS_jPfS_j
00000000007cf460 l O .bss 0000000000000008 _ZZL39__device_stub__Z16inverseCNDKernelPfS_jPfS_jE3__f
0000000000406186 l F .text 0000000000000088 _ZL54__device_stub__Z16MonteCarloKernelP14__TOptionValuePfiP14__TOptionValuePfi
00000000007cf478 l O .bss 0000000000000008 _ZZL54__device_stub__Z16MonteCarloKernelP14__TOptionValuePfiP14__TOptionValuePfiE3__f
0000000000406eed l F .text 0000000000000025 _ZL16inverseCNDKernelPfS_j
000000000040709e l F .text 0000000000000025 _ZL16MonteCarloKernelP14__TOptionValuePfi
0000000000406e65 l F .text 0000000000000088 _ZL39__device_stub__Z16inverseCNDKernelPfS_jPfS_j
00000000007e74a0 l O .bss 0000000000000008 _ZZL39__device_stub__Z16inverseCNDKernelPfS_jPfS_jE3__f
0000000000407016 l F .text 0000000000000088 _ZL54__device_stub__Z16MonteCarloKernelP14__TOptionValuePfiP14__TOptionValuePfi
00000000007e74b8 l O .bss 0000000000000008 _ZZL54__device_stub__Z16MonteCarloKernelP14__TOptionValuePfiP14__TOptionValuePfiE3__f

Executable contains the same function names from different object files.

During execution series of these functions are called implicitly:

__cudaRegisterFatBinary

__cudaRegisterFunction

cudaConfigureCall
cudaSetupArgument

cudaLaunch

and here is my point, according to NVIDIA CUDA Library Documentation 4.1, in function cudaLaunch “the parameter entry must be a device function symbol”.

The question is, based on cudaLaunch argument, how runtime API knows which function should call when the entry is only device function symbol?

I hope someone will satisfy my curiosity.

Thanks