Cuda kernel code memory management

How is the cuda kernel code managed in nvidia GPU?
In CPU, the code is managed by memory page mechanism, where we can swap in and swap out the code of a process to disk. When we launch a cuda process, we also need to transfer the kernel code to GPU. My questions are:

  1. Could the kernels be swaped in and swaped out from the global memory? With “lazy loading”, kernels can be swaped into the global memory when needed. Is there any chance that it can be swaped out, e.g. when it is unused for a long time?
  2. Is there any technique that we can control the transfer of kernel code? For example, we want to free the space (of GPU) of a kernel if we don’t use it in a long time.

The instructions are typically small vs. the data. So the kernels resident in global memory should not be an issue.

When executed, the instructions are cached in instruction caches (with sizes in small KB range).

Otherwise the kernels are stored in global memory. Why would you want to swap them out?

Do you have large number of kernels (>> 10000?) or very very large kernels, which usually are not performant due to the small instruction caches? And you need the space in global memory?

I am trying to implement a serving system which accept DNN computation graph, generate CUDA code, compile them and load them. The system should be able to load cuda kernels as a dynamic library format and unload them. If I cannot free the space of kernel code, it will become larger and larger. Are there any solutions?

A simple and brute-force way to solve this is to restart the serving system and it will clean everything on GPU. But I think there may be other elegant ways?