CUDNN_STATUS_EXECUTION_FAILED only on P100 when trying to run tensorflow benchmarks

Hello,

I am trying to run the same benchmarks of tensorflow and cuDNN on our 3 nvidia cards:
M40,
K80 and P100

we have cuDNN5, cuda 8 end tensorflow1.0.0 for python3.

The bench I am trying to run can be found here:
https://github.com/soumith/convnet-benchmarks

While all the benckmaks run fine on the M40 and K80 cards, they all fail when I try to run them on the P100.

Here is the error message I get:

$ python3 benchmark_alexnet.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn’t compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 0.405
pciBusID 0000:04:00.0
Total memory: 15.89GiB
Free memory: 15.61GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0)
F tensorflow/stream_executor/cuda/cuda_dnn.cc:2001] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED

I have tested the cuDNN5 installation with the code samples provided here:
https://developer.nvidia.com/rdp/cudnn-archive

The test ran successfully.

Both my LD_LIBRARY_PATH and my PATH variable are set correctly.

What do you think could be the cause of this?
I may be something very basic…

I thank you in advance for any hint,

Regards,

Véronique