How to accelerate DNN training speed on specific CUDAs

After setting up nvidia.persistenced on two GPUs respectively, it brings totally different speed effects on the training of oxford_flowers 102. For instance, I use each of the GPUs to train 100 epochs on the AlexNet for TensorFlow 2.1.

Machine One: Nvidia RTX 2070 Super
Env: Driver v450.57, CUDA 11.0, cuDNN 8.0.1, Ubuntu 18.04

1. Power & GPU Memory:

Initial Power: 3W/215W
Power during Training: 187W/215W

Initial GPU Memory: 300MiB
GPU Memory during Training: 5343MiB

Effect: 45 minutes for completing the training.

The power increase form 3W to 187W. In addition, the allocated GPU Memory increases from 300MiB to 5343MiB. It grows quite faster. Therefore, its training speed is much more fast.

2. Processes:

Beside of GID, it shows GI ID and CI ID with the command of nvidia-smi. But both GI ID and CI ID are written as N/A. I get to know that GI is GPU Instance and CI is Computer Instance in the Multiple Instance GPU(shortly MIG). It seems that the GPU adopts the concurrency for accelerating the speed from the following link.

https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html

Machine Two: Nvidia RTX 2060
Env: Driver v440.100, CUDA, Toolkit 10.2, cuDNN 7.3.1, Ubuntu 18.04

1. Power & GPU Memory

Initial Power: 9W/160W
Power during Training: 16W/160W

Initial GPU Memory: 300MiB
GPU Memory during Training: 500MiB

The power constantly fixes on 16W and GPU memory increases from 300MiB to 500MiB. It grows quite slow.

Effect: Its training speed is about 5 hours, also more slower than RTX 2070.

2. Processes:

It shows GID but no GI ID and CI ID.

It is quite apparent that both GI(GPU Instance) and CI(Compute Instance) accelerate the training speed. Is the combination of GPU Driver 450.57 + CUDA 11.0 + cuDNN 8.0.1 on RTX 2060 to make the concurrency and greatly improve the training speed? Can RTX 2060 support MIG?

Note:

I enable GPU Fan Settings from 45% to 74% and PowerMixer Level 0 to Level 4. But both of them has no effect on the training speed except the a little big noise.

1 Like

Maybe you didn’t load the model to RTX 2060. Sometimes, you should check whether the model was loaded on the GPU. See the memory usage, at the same time, you have to check the GPU utilities. The model may not run on the GPU, but on CPU.
I loaded the VGG onto the GPU, the memory usage was 9 GB, but the utilities are 0% or < 1%, almost not run on the GPU, but run on the CPU.

Thanks for your suggestion. I have solved the the issues by the following methods on the RTX 2060.

1. Upgrade from CUDA 10.2/cuDNN 7.3.1 to CUDA 11.0/cuDNN 8.0.1

Only CUDA 11.0/cuDNN 8.0.1 supports growing (full) usage of the GPU memory. The older verisions are quite conservative to assign enough GPU memory to an application. CUDA 11.0/cuDNN 8.0.1 supports full usage of GPU memory and MIG(multiple Instance GPUs).

2. Set up the GPU in the condition of allocation exceeds system memory

import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

3. Release the GPU Memory for the Next Application in Jupyter Notebook

Jupyter Notebook does not automatically release the GPU memory. So I insert the numba code to the last cell of the Notebook to release the GPU memory.

from numba import cuda

cuda.select_device(0)
cuda.close()

The only drawback is that CUDA 11.0/cuDNN 8.0.1 includes the CUPTI functionality. The CUPTI hinders the established TensorFlow 2.1/Keras 2.3.1 (that shows a bug during the iterations of the training). So I have to install the newer TensorFlow2.2/Keras 2.4.3 to fill in the gap and use the following command with the CUPTI parameter in the non-jupyter-notebook environment.

$ python abc.py --cap-add=CAP_SYS_ADMIN

It still reminds me of the privilege issue but can run the application as normal as possible. I think that it is the Nvidia CUDA’s problem.

The above-mentioned methods greatly improve the training speed on RTX 2060.

Cheers.

1 Like