per your recommendation, sudo jetson_clocks (CPU 2.3GHz, GPU 1.4GHz) and sudo tegrastats, the situation doesn’t improve that much (still see frame hiccups):
from CPU loading and GPU loading observation,
CPU avg 2765mW mem 2.9GB
GPU avg 2000mW mem 868MB
interestingly if I comment out the code change and resume back without performing Gaussian Filter in cuda (i.e. in your code #if 1 … #endif change to #if 0 … #endif), then the frame rate is much smoother:
CPU avg 2910mW mem 2.7GB
GPU avg 2028mW mem 808MB
I tried to put the same Gaussian Filter into a nvivafilter implementation like here
Then I don’t need to run sudo jetson-clocks and only use 30W ALL mode (CPU 1.2GHz, GPU 905MHz) the resulting pipeline is much much smoother (this is what I would expect running on GPU speed):
CPU avg 1275mW mem 2.2GB
GPU avg 1155mW mem 605MB
Queston: does gstdsexample.cpp have a lot overheads (e.g. unneccessary buf copying back/forth from CPU mem to GPU mem) causing such slow performance (compared with nvivafilter implementation)?
Thank you for your insights in advance.