Loading Kernel Code to Device Point of Time when Kernel Code will be loaded to the Device

Hi there,
i’m working on an evaulation of cuda. so its important for me to know at which point of time the kernel code will be loaded to the device with the runtime api and the driver api. will be the kernel code loaded when i start the execution of the kernel in the runtime api with the angel brackets or at the start of the compiled program? and in the driver api, when i load a cubin or ptx file with the function cuModuleLoad?
thanks in advance