Hello,
I have written two versions of accelerated function mfuncy_jit() in Numba, one using jit for CPUs and the other one using cuda.jit to target GPU. When I run myfunc_jit optimized under jit, then it takes far longer for the function to complete in its first initial run than later runs, which I guess is due to just-in-time compilation during the first run. However, when I run myfunc_jit under cuda.jit, then there is no such discrepancy between the initial and later runs of the function. Both versions of the function produce the correct result, and the cuda.jit version is consistently faster by orders of magnitude compared to the CPU version. My question is since the first-time startup time for the cuda.jit function seems to be negligible, does it mean that some code compilation has already happened in the background before the function has been called, in contrast to the jit version for CPUs?