I have read about Cuda libraries needing time to initialization during the first time they are called.
I have experienced this first hand using NPP and using my own CUDA kernels. And what I have found is that what initializes one method or library doesn’t initialize another. For instance, recently I have been playing with Canny filter code. I created my own Canny filter with my own kernels and managing the data myself using cudaMalloc, cudaMemCopy, etc. I found that if before I run my main code, I create a small array (10 ints), cudaMalloc it and then copy it up to and back from the GPU, then all of my subsequent filter code runs fast each iteration. If I don’t, the first iteration through runs on the order of 50x slower than succeeding iterations (~120ms vs ~3ms).
Then I tried using nppiFilterCannyBorder_8u_C1R() instead. When I first tried, it took so long that I didn’t pursue it but after I realized that the “pump might need priming” I went back and tried again. Unfortunately, what I did for my kernel code didn’t work for the NPP code. I had to run nppiFilterCannyBorder_8u_C1R() a second time to get the performance I expected from an Nvidia library and the increase was dramatic. The first time through the code took ~130ms, the second time on the same image it took ~30us (~4000x!).
So, what does one have to do to “initialize” an Nvidia library and does each one need to be initialized separately or is there a way to initialize the “system”? And what is the best way to initialzie things? I found calling nppiFilterCannyBorderGetBufferSize() didn’t do it for me using the Canny function, I had to call that function to get things going.
Also, why is this behavior not more visible? I could be looking in the wrong places or searching for the wrong terms but I haven’t seen anything “official” about this, just entries in the forums. It seems to me this should be more exposed by Nvidia and that there should be an “official” way to prime things that could be done during an initialization step when speed doesn’t yet matter if there isn’t a way already.