When exactly is kernel code transferred to GPU?


I’m a little bit confused about hots-to-device transfers of kernel code and haven’t been able to find a clear answer. When exaclty, during the execution of a CUDA program, is kernel code (as a cubin) transferred to GPU? At the time of its first call in the host code? I would be very grateful for some clarification…


at cuModuleLoad (or the runtime call that lazily initializes the CUDA context)



I had a similar question. Does somebody know what the situation is like if the runtime API is used?
I’m likewise interested in understanding how long the cubin code remains on the device. It could not find
an unload command in the driver API. So I assume that there must be some kind of garbarge collection for the
device code. I would assume that the cubin file remains on the device until the context is destroyed or
no more memory on the device is available to transfer new cubin files, i.e. an anew kernel launch should not
result in an additional transfer of cubin files. It this assumption correct.