Some questions about cuGetProcAddress

I have read the article Exploring the New Features of CUDA 11.3.
And I am very interested in this new feature : cuGetProcAddress, this is the description of the API :

CUDA 11.3 also introduces a new driver and runtime API to query memory addresses for driver API functions. Previously, there was no direct way to obtain function pointers to the CUDA driver symbols. To do so, you had to call into dlopen , dlsym , or GetProcAddress . This feature implements a new driver API, cuGetProcAddress , and the corresponding new runtime API cudaGetDriverEntryPoint .

After read this description, I have some questions.

  • First, before the introduce of cuGetProcAddress, we can only find CUDA driver symbols using dlopen, dlysm, or GetProcAddress, so it it means is dynamically linked to the executable?

  • I found out that, when the running the executable compiled with CUDA 11.3(or 11.4), the program will call cuGetProcAddress. Why the program call cuGetProcAddress? What is the reason of this behavior? And why did I know the program would call cuGetProcAddress since I use LD_DEBUG=symbols to trace the symbol lookup, and it shows :

1652407:	symbol=cuGetProcAddress;  lookup in file=/usr/lib/ [0]

Here are my questions.


Not necessarily. If is dynamically linked to an application, then you can call functions from that library directly, e.g.


It’s also possible to do something called runtime linking. In this case, you would often use a separate library function from linux called dlopen (please just take a look at the linux man page for that, I don’t wish to write a tutorial on runtime linking), as well as (probably) those other library calls. These allow you to “manually” associate a function pointer with an entry point in that library. In this fashion, an application can load and use a .so library that it is not “dynamically” linked to (in the sense that when the application was compiled, that library was not provided as part of the linker specification).

I suspect, for example, that the the CUDA runtime library (, or libcudart_static.a) uses the runtime linking method to access Therefore, I would generally not be surprised by any report that an application that uses the CUDA runtime API also calls entry points in the

The exact method of runtime linking may have changed. Originally, it may be using the GetProcAddress method. In later versions of CUDA, it may well be using the cuGetProcAddress method.

Your reference appears to indicate that the application was looking for the cuGetProcAddress symbol in the entry point table of This doesn’t strike me as surprising at all. The application could first load using the dlopen method, then look for the entry point for cuGetProcAddress, and once it had that entry point, switch to runtime linking using the “new” method using cuGetProcAddress.

I haven’t actually verified any of this, it is mostly just conjecture, but none of your reports are surprising to me.

1 Like