The Xavier/Orin in our system has ~10 separate processes that launch CUDA kernels. We want to know if a GPU operation (CUDA kernel execution or Memory Copy) is blocked because of other GPU operations running from separate processes, since Jetsons do not support MPS or MIG.
I intend to use nvprof
(on Xavier running CUDA 10) for this purpose. Is there a more efficient way to measure GPU availability?
To clarify: I am not looking to measure GPU utilization (tegrastats or the GPU load sysfs file)