While running in CPU by explicitly specifying CUDA_VISIBLE_DEVICES=-1, both the single person and multi-person pose estimation worked and gave proper results as it was expected and the results are consistent .
But when I try to run the same code on GPU(CUDA_VISIBLE_DEVICES=0), the code works without any errors but the results are wrong. I tried to print the output (outputs_np[‘part_prob’]), it is giving me different results every time I run.
On side note, when I try to run on my desktop GPU (GeForce GTX 1050 Ti), the results are correct and consistent.
Is there anything I can do to correct this behavior?
I am neither using DIGITS nor TensorRT. I am directly using Tensorflow framework on TX2. In the link you shared the person who asked the question says he was getting the similar behavior in x86 and Jetson independent.
But in my case as already stated, I am facing issue only on Jetson TX2. In my x86 PC which has NVidia Graphics card I am getting the expected results correctly.
AastaLL: Thanks, I am getting similar debug trace running cuda-memcheck. I am using CUDA-9.0 and Jetpack 3.3. Is there an update that I can use to solve this problem or should I go back to CUDA-8.0 as you mentioned in that answer?
LOG:
========= CUDA-MEMCHECK
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]
========= Program hit cudaErrorLaunchOutOfResources (error 7) due to “too many resources requested for launch” on CUDA API call to cudaLaunch.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 [0x2e69e8]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcudart.so.9.0 (cudaLaunch + 0x128) [0x414cc]