pb:tensorflow-gpu with cuda 7.5 and cudnn 4 is faster then tensorflow-gpu cuda8 and cudnn 6

I tested tensorflow 0.8 with cuda 7.5 and cudnn 4 (Maxwell Quadro K620), execution time=0.075s when i test the same code with cuda 8 and cudnn 5 , execution time=0.75 s
I tested the some code with tensorflow 1.3 , cuda 8 ,cudnn 6 ,jetpack 3.1 (jetson tx2) , execution time =0.6s when i try to get the timeline with chromium::/tracing i added tensorflow.python.client import timeline that use libcputi , execution time become =0.3s ,the code run on gpu , because when i run it in cpu execution time become 0.9 s.

the code uses face detection with cnn.
i don’t understand why the code is faster when i use tensorflow with cuda 7.5 and cudnn4 then cuda 8 and cudnn 5 or 6 . any help please?

Hi,

  1. Please remember the maximize your device.
sudo nvpmodel -m 0
sudo ~/jetson_clocks.sh
  1. Make sure you are not using the swap space.