Currently , I am making a dll for encoding/decoding images (in BGR) to Jpeg in the goal to replace my current dll which is done using Intel Ipp, in the hope to obtain even more performances.
So, starting from the sample “jpegNPP” provided with cuda SDK 7.5, i wrote my new dll, and globally it works. But when i measure the performances,… what a disappoitment. To convert a BGR image of size 1280x592 and to compress it in Jpeg, it takes 7 msec with the my Intel Ipp Dll, while with my Npp Dll, it takes 10 msec. These 10 msec are decomposed in 2-3 msec to allocate and copy memory on the device (which is long), ~0.2 msec to convert from BGR to YCbCr422 plans (which is fast), and the rest of the time to convert to Jpeg data.
After reading the page https://developer.nvidia.com/npp, i expected better performances. Ok, my device is not a Tesla K40M, it’s only … a Quadro K620. (but my processor is not a 12 cores E5-2697, but only a quad-cores i7-4790 running Windows 8.1).
As i’m new to cuda, and before invest in more powerful device (and which one?), someone can help and tell me if there is a way to accelerate Npp functions, memory allocation & copy,…
Th Quadro K620 is a low-end GPU. The performance difference between that and the fastest GPUs is around a factor of ten, generally speaking. A i7-4790 on the other hand is a high-end CPU, particularly in terms of single-thread performance. You may want to compare the FLOPS ratings of your CPU and GPU, along with their respective memory bandwidth. I suspect they are quire similar, but don’t have the time right now to dig out the numbers.
The only way to know for sure how fast your particular application will run on a faster GPUs is to try it. Maybe you can borrow a faster device, or maybe you can ask someone with such a fast device to run your application on their system.
thanks for reply,
i’m agree with you, the K620 is not a high performance GPU (especially in term of memory bandwidth).
As a replacement GPU, i look in the direction of a quadro K4200 or M4000 (because they require only 1 slot), or GeForce GTX 960, 970 or 980.
But before acquiring another GPU, i’d like to know if there are some possibilities to optimize the way of using Npp functions.
All advices are welcome
Finally, I invested in a GeForce GTX 1060. With more than 3 times more cores (Pascal rather than Maxwell), 3 times more memory, but also 3 times more power consumption, and twice the price, I expected a serious performance improvements.
What a disappointment! compared to the execution time reached with a K620, only a reduction of 30%.