Pose Estimation using Tensorflow on TX2

Hi,

I wanted to run a pose estimation algorithm using Tensorflow on Nvidia’s Jetson TX2 platform. (https://github.com/eldar/pose-tensorflow)

While running in CPU by explicitly specifying CUDA_VISIBLE_DEVICES=-1, both the single person and multi-person pose estimation worked and gave proper results as it was expected and the results are consistent .

But when I try to run the same code on GPU(CUDA_VISIBLE_DEVICES=0), the code works without any errors but the results are wrong. I tried to print the output (outputs_np[‘part_prob’]), it is giving me different results every time I run.

On side note, when I try to run on my desktop GPU (GeForce GTX 1050 Ti), the results are correct and consistent.

Is there anything I can do to correct this behavior?

Hi,

Could you check if you meet the similar issue of this topic:
https://devtalk.nvidia.com/default/topic/1044647/jetson-tx2/very-odd-results-when-inferencing-digits-tf-model-on-jetson-tx2-with-jetpack-3-3/

Thanks.

I am neither using DIGITS nor TensorRT. I am directly using Tensorflow framework on TX2. In the link you shared the person who asked the question says he was getting the similar behavior in x86 and Jetson independent.

But in my case as already stated, I am facing issue only on Jetson TX2. In my x86 PC which has NVidia Graphics card I am getting the expected results correctly.

Hi,

Could you check if there is an batch_to_space_nd operation in your model?
TensorFlow batch_to_space_nd implementation is incorrect for TX2 in certain dimension:
https://devtalk.nvidia.com/default/topic/1037898/jetson-tx2/tensorflow-batch_to_space_nd-not-working-for-large-channel-sizes-on-tx2/

You can test it with cuda-memcheck:

cuda-memcheck python [app].py

This issue is already fixed in the Xavier platform.
Thanks.

AastaLL: Thanks, I am getting similar debug trace running cuda-memcheck. I am using CUDA-9.0 and Jetpack 3.3. Is there an update that I can use to solve this problem or should I go back to CUDA-8.0 as you mentioned in that answer?

LOG:
========= CUDA-MEMCHECK

========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]

========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]

========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]

========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]

========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]

========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]

========= ERROR SUMMARY: 6 errors

Hi,

From your log, this is the same issue of topic-1037898.
It was fixed on Jetson Xavier but we don’t have a concrete plan to fix it on TX2.

You can try CUDA-8.0 since this issue occurs from CUDA-9.0.
For CUDA-8.0(JetPack3.1), you can check jetsonhacks’ tutorial for the TensorFlow installation:
https://www.jetsonhacks.com/2017/09/14/build-tensorflow-on-nvidia-jetson-tx2-development-kit/

Thanks.