I’m new to CUDA and I’m using the nvcc environment to develop a particle system.
One thing that immediately comes to mind is I’m unsure how the GPU loads and stores the kernel code. What triggers the upload of the code and is it cached on the GPU?
In particular, if I launch several kernels in a loop, does the device code persist on the GPU?
For example, in this loop, when does the Fn0, Fn1 and Fn2 get loaded onto the GPU? Is Fn1 and Fn2 reloaded on each iteration?
Fn0 <<<Dg, Db, Ns >>>(a,b,c);
for (int i = 0; i < 10000; i++)
{
Fn1 <<<Dg, Db, Ns >>>(d,e,f);
Fn2 <<<Dg, Db, Ns >>>(h, j, k);
}