After setting up nvidia.persistenced on two GPUs respectively, it brings totally different speed effects on the training of oxford_flowers 102. For instance, I use each of the GPUs to train 100 epochs on the AlexNet for TensorFlow 2.1.
Machine One: Nvidia RTX 2070 Super
Env: Driver v450.57, CUDA 11.0, cuDNN 8.0.1, Ubuntu 18.04
1. Power & GPU Memory:
Initial Power: 3W/215W
Power during Training: 187W/215W
Initial GPU Memory: 300MiB
GPU Memory during Training: 5343MiB
Effect: 45 minutes for completing the training.
The power increase form 3W to 187W. In addition, the allocated GPU Memory increases from 300MiB to 5343MiB. It grows quite faster. Therefore, its training speed is much more fast.
2. Processes:
Beside of GID, it shows GI ID and CI ID with the command of nvidia-smi. But both GI ID and CI ID are written as N/A. I get to know that GI is GPU Instance and CI is Computer Instance in the Multiple Instance GPU(shortly MIG). It seems that the GPU adopts the concurrency for accelerating the speed from the following link.
Machine Two: Nvidia RTX 2060
Env: Driver v440.100, CUDA, Toolkit 10.2, cuDNN 7.3.1, Ubuntu 18.04
1. Power & GPU Memory
Initial Power: 9W/160W
Power during Training: 16W/160W
Initial GPU Memory: 300MiB
GPU Memory during Training: 500MiB
The power constantly fixes on 16W and GPU memory increases from 300MiB to 500MiB. It grows quite slow.
Effect: Its training speed is about 5 hours, also more slower than RTX 2070.
2. Processes:
It shows GID but no GI ID and CI ID.
It is quite apparent that both GI(GPU Instance) and CI(Compute Instance) accelerate the training speed. Is the combination of GPU Driver 450.57 + CUDA 11.0 + cuDNN 8.0.1 on RTX 2060 to make the concurrency and greatly improve the training speed? Can RTX 2060 support MIG?
Note:
I enable GPU Fan Settings from 45% to 74% and PowerMixer Level 0 to Level 4. But both of them has no effect on the training speed except the a little big noise.