NPP JPEG Routines Concurrency

KshitizBakshi · June 6, 2016, 2:49pm

Hi,

I am investigating making JPEG related solutions using GPU acceleration via NPP. I use the IDCT, DCT, and resize functions. We’re looking at deploying a service, so concurrency is of utmost importance to us to drive throughput. We’re facing some issues.
As I understand, there are two ways of driving concurrency in the CUDA world:
1. Multi Thread and use different CUDA streams.
The problem here is that the IDCT and DCT functions have been marked as “Not thread safe” in the NPP manual. Any idea whether there is some news on these being made thread safe in newer versions of NPP?
2. Use MPS Server and go multi process.
This is the only practical option for us if multi threading is out.
The problem we are facing here is, that the service is to be deployed on Amazon EC2 and the GPU instances there have an NVIDIA GRID K520, which seems to support only SM 3.0, where as MPS Server requires >= SM 3.5.
Anyone faced this issue on Amazon EC2 instances?

Would greatly appreciated some info here.

Robert_Crovella · June 6, 2016, 3:20pm

You can have a multi-threaded app as long as only a single thread drives NPP.

You’re not likely to get much concurrency benefit by trying to drive multiple NPP routines operating on images of any reasonable size on a single GPU (the number of threads/blocks launched by the NPP routine kernels will essentially be large enough to fill up the GPU – preventing any kernel concurrency.)

Therefore one possible approach to consider might be to have a multi-threaded app that hands all its NPP work to a single thread, which performs the NPP calls and then returns the results to the various requesting threads. Other aspects of concurrency can easily be managed by the single NPP thread, such as overlap of copy and compute.

You won’t be able to deploy MPS on a SM3.0 device.

NPP has undergone some changes in CUDA 8.0 RC. If you haven’t taken a look at that you may wish to, although a couple functions are still marked as non-thread-safe:

nppiDCTQuantFwd8x8LS_JPEG_8u16s_C1R
nppiDCTQuantInv8x8LS_JPEG_16s8u_C1R

Topic		Replies	Views
Can NPP be safely used in multi-threaded code? GPU-Accelerated Libraries npp	3	4046	October 12, 2021
Are NPP routines CPU-thread-safe ? GPU-Accelerated Libraries	2	2463	August 22, 2014
Does npp library thread safe? Jetson TX2	1	516	May 29, 2019
using npp on multiple stream CUDA Programming and Performance	2	1481	July 12, 2013
nppiResize_8u_C1R function CUDA Programming and Performance	2	1625	May 19, 2015
NPP Stream crash GPU-Accelerated Libraries	5	2588	March 21, 2017
Is NPP thread safe? GPU-Accelerated Libraries	1	510	August 9, 2018
Is NNP stream variable on thread local storage now? GPU-Accelerated Libraries npp	1	832	October 12, 2021
NPP row and column filters GPU-Accelerated Libraries	7	2890	December 3, 2015
NPP & stream problems? GPU-Accelerated Libraries npp	1	1739	October 12, 2021

NPP JPEG Routines Concurrency

Related topics