Keras with Tensorflow backend - NN training on GPU is almost 10 times slower than CPU

Hi ,

There is something going wrong and I need the team’s help here…

My Desktop config is as follows

  • Core i9 9900 K (9th Gen)
  • GTX 2070 Super

I have successfully installed Tensorflow-gpu and all necessary CUDA and CUDNN libraries. It is getting detected successfully…

When I import Keras which uses tensorflow backend in my jupyter notebook the following logs are observed in anaconda prompt

name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:01:00.0
2019-10-29 20:47:00.840937: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-29 20:47:00.843754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-29 20:47:01.380298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-29 20:47:01.383117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-10-29 20:47:01.385096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-10-29 20:47:01.387454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6283 MB memory) → physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-10-29 20:47:02.041798: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll

Now neural nets training functions like
cross_val_score , GridSearchCV are taking aweful amounts of time to complete… In fact what gets completed by i9 in 7.5 mins , 2070 takes 65 mins…

I have no idea, whats going wrong…

Any indications… I can start putting in the code, if needed…

Regards,
Suvo

what NVIDIA drivers did you try on this case ? Is it 440.0 above ? If yes, please take a look similar issue I posted in the same topics (container-tensorflow), and my resolution.

I am on 441.08 … I went through, but couldn’t understand the resolution …

All my training, include CNNs are running very slow… I am using

config.gpu_options.allow_growth = True, which is working as I see only the needed memory is getting utilized, but my GPU utilization is not going above 5-6% …

How do I know, that all my cores of the GPU are getting utilized?

Regards,
Suvo

What model are you training? If your input pipeline contains many tiny operations, the GPU can become latency bound. In that case, assigning the input pipeline to the CPU can help. You can test with the ResNet50 example from github.som/NVIDIA/DeepLearningExamples to make sure there is nothing wrong with your setup in general.

I’m having a similar issue and will post my issue shortly. One thing I can say already is that the GridSearchCV is part of the scikit-learn (not TensorFlow-GPU) library. That means, it cannot use the GPU functionality anyway. And configuring the CPU cores is done via the n_jobs:
Look through this for further details:
https://joblib.readthedocs.io/en/latest/parallel.html#joblib.parallel_backend