Tesla K20c GPU card utilization 99% with no compute process running

Dear All,

I use a Tesla K20c GPU card in a SGI cluster setup. The OS i am using SUSE Linux Enterprise Server 11 the utilization shoots up to 99% without any compute process running on it .

I posted this question in card forums and got an reply that it would be wise to post this in the Linux forums

Requesting any inputs on the same

n003:~ # uname -m && cat /etc/*release
x86_64
LSB_VERSION=“core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64”
SGI Accelerate 1.4, Build 706r14.sles11sp2-1204092008
SGI Foundation Software 2.6, Build 706r14.sles11sp2-1204092008
SGI MPI 1.4, Build 706r14.sles11sp2-1204092008
SGI UPC 1.4, Build 706r14.sles11sp2-1204092008
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2\

n003:~ # nvidia-smi
Tue Oct 14 12:44:14 2014
±-----------------------------------------------------+
| NVIDIA-SMI 5.325.15 Driver Version: 325.15 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20c Off | 0000:04:00.0 Off | 0 |
| 30% 33C P0 46W / 225W | 11MB / 4799MB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K20c Off | 0000:83:00.0 Off | 0 |
| 30% 33C P0 42W / 225W | 11MB / 4799MB | 99% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| No running compute processes found |
±----------------------------------------------------------------------------+
n003:~ #
n003:~ #
n003:~ # nvidia-smi -q

==============NVSMI LOG==============

Timestamp : Wed Oct 15 15:11:29 2014
Driver Version : 325.15

Attached GPUs : 2
GPU 0000:04:00.0
Product Name : Tesla K20c
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 128
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0325112010002
GPU UUID : GPU-9be4e169-a202-25d7-c9a5-8343eb335802
VBIOS Version : 80.10.14.00.02
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x04
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:04:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : 30 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
Memory Usage
Total : 4799 MB
Used : 11 MB
Free : 4788 MB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
Gpu : 34 C
Power Readings
Power Management : Supported
Power Draw : 47.34 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 705 MHz
SM : 705 MHz
Memory : 2600 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Compute Processes : None

GPU 0000:83:00.0
Product Name : Tesla K20c
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 128
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0325112010064
GPU UUID : GPU-e132fdb5-3a2b-53e5-ffb4-a5957b901ed0
VBIOS Version : 80.10.14.00.02
Inforom Version
Image Version : 2081.0204.00.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x83
Device : 0x00
Domain : 0x0000
Device Id : 0x102210DE
Bus Id : 0000:83:00.0
Sub System Id : 0x098210DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : 30 %
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
Memory Usage
Total : 4799 MB
Used : 11 MB
Free : 4788 MB
Compute Mode : Default
Utilization
Gpu : 97 %
Memory : 6 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
Gpu : 34 C
Power Readings
Power Management : Supported
Power Draw : 43.32 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 705 MHz
SM : 705 MHz
Memory : 2600 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Compute Processes : None

n003:~ #
n003:~ #