How to accelerate DNN training speed on specific CUDAs

mikechen6688 · July 19, 2020, 6:14am

After setting up nvidia.persistenced on two GPUs respectively, it brings totally different speed effects on the training of oxford_flowers 102. For instance, I use each of the GPUs to train 100 epochs on the AlexNet for TensorFlow 2.1.

Machine One: Nvidia RTX 2070 Super
Env: Driver v450.57, CUDA 11.0, cuDNN 8.0.1, Ubuntu 18.04

1. Power & GPU Memory:

Initial Power: 3W/215W
Power during Training: 187W/215W

Initial GPU Memory: 300MiB
GPU Memory during Training: 5343MiB

Effect: 45 minutes for completing the training.

The power increase form 3Ｗ to 187W. In addition, the allocated GPU Ｍemory increases from 300MiB to 5343MiB. It grows quite faster. Therefore, its training speed is much more fast.

2. Processes:

Beside of GID, it shows GI ID and CI ID with the command of nvidia-smi. But both GI ID and CI ID are written as N/A. I get to know that GI is GPU Instance and CI is Computer Instance in the Multiple Instance GPU(shortly MIG). It seems that the GPU adopts the concurrency for accelerating the speed from the following link.

Machine Two: Nvidia RTX 2060
Env: Driver v440.100, CUDA, Toolkit 10.2, cuDNN 7.3.1, Ubuntu 18.04

1. Power & GPU Memory

Initial Power: 9W/160W
Power during Training: 16W/160W

Initial GPU Memory: 300MiB
GPU Memory during Training: 500MiB

The power constantly fixes on 16W and GPU memory increases from 300MiB to 500MiB. It grows quite slow.

Effect: Its training speed is about 5 hours, also more slower than RTX 2070.

2. Processes:

It shows GID but no GI ID and CI ID.

It is quite apparent that both GI(GPU Instance) and CI(Compute Instance) accelerate the training speed. Is the combination of GPU Driver 450.57 + CUDA 11.0 + cuDNN 8.0.1 on RTX 2060 to make the concurrency and greatly improve the training speed? Can RTX 2060 support MIG?

Note:

I enable GPU Fan Settings from 45% to 74% and PowerMixer Level 0 to Level 4. But both of them has no effect on the training speed except the a little big noise.

eason-long · July 19, 2020, 10:22am

Maybe you didn’t load the model to RTX 2060. Sometimes, you should check whether the model was loaded on the GPU. See the memory usage, at the same time, you have to check the GPU utilities. The model may not run on the GPU, but on CPU.
I loaded the VGG onto the GPU, the memory usage was 9 GB, but the utilities are 0% or < 1%, almost not run on the GPU, but run on the CPU.

mikechen6688 · August 6, 2020, 4:02am

Thanks for your suggestion. I have solved the the issues by the following methods on the RTX 2060.

1. Upgrade from CUDA 10.2/cuDNN 7.3.1 to CUDA 11.0/cuDNN 8.0.1

Only CUDA 11.0/cuDNN 8.0.1 supports growing (full) usage of the GPU memory. The older verisions are quite conservative to assign enough GPU memory to an application. CUDA 11.0/cuDNN 8.0.1 supports full usage of GPU memory and MIG(multiple Instance GPUs).

2. Set up the GPU in the condition of allocation exceeds system memory

import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

3. Release the GPU Memory for the Next Application in Jupyter Notebook

Jupyter Notebook does not automatically release the GPU memory. So I insert the numba code to the last cell of the Notebook to release the GPU memory.

from numba import cuda

cuda.select_device(0)
cuda.close()

The only drawback is that CUDA 11.0/cuDNN 8.0.1 includes the CUPTI functionality. The CUPTI hinders the established TensorFlow 2.1/Keras 2.3.1 (that shows a bug during the iterations of the training). So I have to install the newer TensorFlow2.2/Keras 2.4.3 to fill in the gap and use the following command with the CUPTI parameter in the non-jupyter-notebook environment.

$ python abc.py --cap-add=CAP_SYS_ADMIN

It still reminds me of the privilege issue but can run the application as normal as possible. I think that it is the Nvidia CUDA’s problem.

The above-mentioned methods greatly improve the training speed on RTX 2060.

Cheers.

Topic		Replies	Views
cuDNN failed to initialize cuDNN	2	1592	September 16, 2019
RTX 3070 with CUDA10.0 compatibility [UbuntuOS, any version] Linux	15	11532	February 25, 2021
Slow training of neural networks on GPU CUDA Programming and Performance	17	4029	April 21, 2021
RTX 4060, Win11, TF 2.19.0, CUDA 12.3.2 - GPU not detected despite nvidia-smi/deviceQuery PASS CUDA Setup and Installation cudnn	2	50	July 9, 2025
GeForce GTX 1660 super , cuda not working in Anaconda cuDNN kb	11	15445	October 12, 2021
Tensorflow with RTX 2070 Super Frameworks tensorflow	14	9489	December 21, 2019
Problem to run training with the new RTX 3080 cuDNN cuda , tensorflow , cudnn	1	1600	January 6, 2021
Does the latest GTX 1660 model support cuda? CUDA Setup and Installation	16	67199	October 1, 2023
GPU functioning only at 16% with CUDA and cuDNN installed (Geforce GTX 750 Ti) CUDA Programming and Performance	5	2635	May 26, 2018
Keras with Tensorflow backend - NN training on GPU is almost 10 times slower than CPU Frameworks tensorflow	4	1581	April 11, 2020

How to accelerate DNN training speed on specific CUDAs

Related topics