I was wondering whether the nppiResize_8u_C1R function (part of NPP library) can somehow be called with within a specific CUDA stream.
I need to resize images from different host threads. Preferably I would like to use as much GPU as possible, i.e. ideally pushing usage to 100%. However I cannot achieve more than 20% of GPU usage for say 16 host threads.
My assumption is that the nppiResize_8u_C1R function somehow serializes, i.e. it waits one host thread to finish then it handles the second thread etc. Is this correct? If it is how would one get concurrent execution of this function?
Thanks a lot