Run time improvement of a few AI algorithms running together on a single GPU

gabig · April 27, 2021, 6:59am

Hi All,

We are running a few AI algorithms on a single Nvidia GPU (PC with GeForce or Nvidia Jetson AGX) and we are interested to improve the run time performances.
We have a few questions regarding this issue.
Q1. Does the GPU hold all the AI programs memory in the GPU memory simultaneously all the time or does it reload it whenever we move from performing one AI algorithm to another?
Q2. How can we tell what takes more time, running the AI algorithm or transaction of data memory to/from GPU memory?
Q3. Are there known strategies to improve run time performances when running a few AI algorithms based on the TensorRT together?

Thanks,
Gabi

AastaLLL · April 27, 2021, 9:12am

Hi,

1.
This depends on the frameworks you use.
For TensorFlow, it by default occupies all the available memory.
But for TensorRT, you can control the usage by parameter.

2.
For TensorRT, there is a parameter called workspace to specify the maximal allowed memory.

3.
For Xavier, you can try INT8 mode and DLA to leverage Tensor cores and inference hardware.

Thanks.

dusty_nv · April 27, 2021, 3:01pm

On Jetson, if you use CUDA mapped memory or CUDA managed memory, you don’t need to perform CPU<->GPU transfers because they share the same physical memory on Jetson. Here is a simple wrapper function that allocates CUDA mapped memory (aka zero-copy):

https://github.com/dusty-nv/jetson-utils/blob/c373f49cf21ad2cae7e4d7da7c41f4fd6473958f/cuda/cudaMappedMemory.h#L64

Then you don’t need to perform any cudaMemcpy() when using it.

Topic		Replies	Views
[QST] Best practices of handling competence of gpu resource? TensorRT tensorrt , cuda , jetson-inference , performance	2	555	September 20, 2021
Optimising GPU and CPU memory transfer time (CUDA/Hardware)? CUDA Programming and Performance hw , cuda	8	4247	January 7, 2022
Beginner Question (Xavier/TX2/Nano): Python, PyCuda, TensorRT memory allocations vs C++ Jetson AGX Xavier	2	805	October 18, 2021
CUDA threading in Jetson Xavier separately Jetson Xavier NX cuda	10	1613	February 2, 2022
GPU Virtualization with QEMU/KVM on Jetson AGX Xavier Developer Kit Jetson AGX Xavier gpu	6	724	May 7, 2024
The Zero Copy Shared memory mode consumes more CPU resources (jetson Xavier NX) Jetson Xavier NX tensorrt , cuda , cudnn	6	70	January 6, 2025
I cannot use GPU well Jetson TX2	4	552	October 18, 2021
data transfer cost a lot of time Jetson TX2	2	743	October 18, 2021
How to share tensorrt between processes Jetson AGX Xavier tensorrt	6	1012	March 1, 2022
Jetson AGX Xavier DDR Test Jetson AGX Xavier performance	16	1727	October 18, 2021

Run time improvement of a few AI algorithms running together on a single GPU

Related topics