nppiResize_8u_C1R function

toni574 · May 18, 2015, 11:58am

Hi,

I was wondering whether the nppiResize_8u_C1R function (part of NPP library) can somehow be called with within a specific CUDA stream.

I need to resize images from different host threads. Preferably I would like to use as much GPU as possible, i.e. ideally pushing usage to 100%. However I cannot achieve more than 20% of GPU usage for say 16 host threads.

My assumption is that the nppiResize_8u_C1R function somehow serializes, i.e. it waits one host thread to finish then it handles the second thread etc. Is this correct? If it is how would one get concurrent execution of this function?

Thanks a lot

Robert_Crovella · May 18, 2015, 3:50pm

npp has nppGetStream() and nppSetStream() functions to handle streams. You can refer to the documentation for these functions.

[url]http://docs.nvidia.com/cuda/pdf/NPP_Library.pdf[/url]

(e.g. p33)

If you issue nppSetStream(…) to some stream you have created, subsequent npp calls (within a given CPU thread) should be issued to that stream.

Otherwise, the npp functions will be issued to the default stream and they will serialize.

Whether or not this affects GPU utilization I can’t say. Not sure what gpu utilization monitor you are using, and it may not be indicating what you think it is. If an individual kernel is fully utilizing the GPU (which would probably be the case for a reasonably large image) then launching additional kernels may not improve utilization much. You may still be able to improve overall efficiency by making effective use of overlap of copy and compute, which is also facilitated by streams (although more than just stream usage is reqiured.)

For a multithreaded application, you’ll also want to be sure your GPU is in the correct compute mode, i.e. Default, or Exclusive Process.

toni574 · May 19, 2015, 7:37am

Thanks txbob.

I think overlap of copy and compute would certainly improve the speed. Do you perhaps know some good resources where I can learn about that?

Topic		Replies	Views
NPP Stream crash GPU-Accelerated Libraries	5	2588	March 21, 2017
NppStreamContext usage for nppi_Ctx functions GPU-Accelerated Libraries	0	520	April 23, 2020
Bug Report: npp_resize function Concurrent execution in CUDA 4.0 causes errors CUDA Programming and Performance	4	4611	June 8, 2011
NppStreamContext usage for nppi"Name"_Ctx functions CUDA Programming and Performance	0	953	April 22, 2020
Npp with multiple Streams GPU-Accelerated Libraries	1	2046	August 31, 2016
using npp on multiple stream CUDA Programming and Performance	2	1481	July 12, 2013
NPP & stream problems? GPU-Accelerated Libraries npp	1	1739	October 12, 2021
Using nppiMean_StdDev_8u_C1R after setNppStream returns NPP_RANGE_ERROR GPU-Accelerated Libraries	2	1740	March 20, 2018
npp nppiResize_8u_C1R gives unexpected result GPU-Accelerated Libraries	4	1270	January 1, 2020
can NPP functions work async ? CUDA Programming and Performance	1	1615	February 9, 2010

nppiResize_8u_C1R function

Related topics