Hi All,
I encounter a dramatic slow down of the performance in a gpu node. When I submit job in the node through a queue, the job initially run well. Output of nvidia-smi shows high GPU utility.
±-----------------------------------------------------+
| NVIDIA-SMI 5.319.37 Driver Version: 319.37 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M2090 Off | 0000:0A:00.0 Off | 0 |
| N/A N/A P0 81W / N/A | 159MB / 5375MB | 65% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M2090 Off | 0000:0D:00.0 Off | 0 |
| N/A N/A P0 106W / N/A | 159MB / 5375MB | 70% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla M2090 Off | 0000:2B:00.0 Off | 0 |
| N/A N/A P0 90W / N/A | 158MB / 5375MB | 65% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla M2090 Off | 0000:30:00.0 Off | 0 |
| N/A N/A P0 112W / N/A | 158MB / 5375MB | 67% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 5531 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 0 5527 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 1 5530 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 72MB |
| 1 5526 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 2 5532 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 2 5528 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 3 5533 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 72MB |
| 3 5529 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
±----------------------------------------------------------------------------+
But after 10-15 minutes the job slows down dramatically and the nvidia-smi shows 0 GPU utility.
±-----------------------------------------------------+
| NVIDIA-SMI 5.319.37 Driver Version: 319.37 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M2090 Off | 0000:0A:00.0 Off | 0 |
| N/A N/A P0 78W / N/A | 159MB / 5375MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla M2090 Off | 0000:0D:00.0 Off | 0 |
| N/A N/A P0 78W / N/A | 159MB / 5375MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla M2090 Off | 0000:2B:00.0 Off | 0 |
| N/A N/A P0 81W / N/A | 158MB / 5375MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla M2090 Off | 0000:30:00.0 Off | 0 |
| N/A N/A P0 81W / N/A | 158MB / 5375MB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 5531 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 0 5527 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 1 5530 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 72MB |
| 1 5526 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 2 5532 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 2 5528 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
| 3 5533 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 72MB |
| 3 5529 …/NAMD_2.8_Source/Linux-x86_64-g++.cudanet/namd2 71MB |
±----------------------------------------------------------------------------+
Any suggestions.