nvidia-docker + cudnn (lua/torch) + cuda9 + Volta 100 GPU takes a very long time to execute "require cudnn"

Hi all, I’ve been trying to make lua + cudnn7 + cuda9 cooperate with the Volta 100 GPU (hosted on the Amazon Deep Learning AMI on a p3.2xlarge instance) but have run into a difficult bug. The “require cudnn” line takes a very long time to execute (~10 minutes). Comparable code running on a p2.xlarge (Tesla K80) is almost instantaneous. The rest of the code executes very quickly on the v100, it’s just the import cudnn line that struggles.

Does anybody have any insight? Is the Amazon Deep Learning AMI missing a required driver?

I used the procedure mentioned here: Using a GPU workload AMI - AWS Batch

(A github issue with the Docker build and other info is posted here): Extremely slow cudnn import with cuda9 and cudnn7 on Volta · Issue #1193 · torch/torch7 · GitHub