We have a network in caffe which we are running using tensorrt on TX2.
When we run it without first running jetson_clocks.sh it works fine but slowly at 180ms per VGA frame.
When we run jetson_clocks.sh the system runs at 30ms per frame VGA frame but the results are nonsense.
Has someone encountered this? What is the best way to debug this or find a fix?
head -n 1 /etc/nv_tegra_release
R28 (release), REVISION: 2.0, GCID: 10136452, BOARD: t186ref, EABI: aarch64, DATE: Fri Dec 1 14:20:33 UTC 2017
Hi twerdster, I have some questions to help debug the issue…which network are you running?
Are you rebuilding the network’s TensorRT CUDA engine and bitstream optimizations between when you run with and without ~/jetson_clocks.sh? Before building the network it is recommended to set IBuilder::setMinFindIterations() to 3 or more (see here) for TX1/TX2 GPU.
Are a different set of layers produced or are they the same?
Does the issue happen with and without FP16 enabled?
What happens if using nvpmodel tool instead of ~/jetson_clocks.sh?