Gk20a_stall interrupt

Hello,
I am trying to implement a real-time processing video-sending program on Jetson Nano. Therefore I need to process video data on GPU and then send it over ethernet.

I am seeing outliers in GPU processing time, that happen periodically (processing on GPU taking like 10x times than normal). It first starts to be like x4 time but increases with time.

I am observing interrupts and I can see that interrupt occurrence related to “gk20a_stall” is increasing over time. So I’d like to know what that interrupt means and what can I try to get rid of this problem.

I am using cudaMallocManaged and I have cycling buffers. Each buffer and its processing kernel is designated to a different stream so that this stream is not active when the CPU reads that buffer for sending over ethernet.
I have seen this behavior also when not using different streams but instead using cudaMemAttachAsync to sync the buffer to global GPU memory before processing and syncing to CPU after GPU processing. So this seems to be not related to my current method, but maybe it is related to cudaManagedMemory.

I am not seeing this behavior on XAVIER NX for example.

Any idea what could be the reason behind this?

Best,
jb

Hi,

Have you maximized the device’s performance?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Please also check the CPU/GPU utilization when the delay occurs.
Since Nano has limited resources, the performance might drop if all the resources are occupied.

Thanks.

Performance ist maximized. GPU usage is very low in percentage.

Hi,

A common cause of low GPU utilization is the GPU is waiting for the input data.
Does your application require a lot of data transfer?

Could you use a dummy input to check if the GPU process time is stable?
Thanks.

Hello,
we were able to solve this issue. In another thread, there was periodically the Unix system(“one bash command”) call used. For some reason that created the GPU stalls.

I cannot explain how that can happen but we removed it and the issue was gone.
It was my bad forgetting about this thread, otherwise I would have deactivated it earlier.

Thanks for the effort anyway!

Best,
jb

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.