Internal Memcheck Error: Device not supported

Hi,

I am running cuda-memcheck on my executable and am getting the error “Internal Memcheck Error: Device not supported”.

Please see the deviceQuery output below:
./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Tesla V100-PCIE-16GB”
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 7.0
Total amount of global memory: 16152 MBytes (16936861696 bytes)
(80) Multiprocessors, ( 64) CUDA Cores/MP: 5120 CUDA Cores
GPU Max Clock rate: 1380 MHz (1.38 GHz)
Memory Clock rate: 877 Mhz
Memory Bus Width: 4096-bit
L2 Cache Size: 6291456 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 7 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 8
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

Bala.

Are there any other GPUs in your system? (such as an older Fermi GPU?)
If so, try running your cuda-memcheck with:

CUDA_VISIBLE_DEVICES=“0” cuda-memcheck …

If not, I think the most likely explanation for this is a corrupted install of some sort. The cuda-memcheck binary or some other component you are running does not work with the GPU driver or CUDA version you have installed.

txbob,

Thank you for reply. There are no other devices. Pl. see deviceQuery output in my post. I tried running with CUDA_VISIBLE_DEVICS=0 also, and the behavior is the same.

Thank you,

Bala.

my best guess would be a corrupted install then.

what is the output of:

lspci |grep -i nv

and:

sudo find / -name cuda-memcheck

?

lspci | grep -i nv
00:08.0 3D controller: NVIDIA Corporation Device 1db4 (rev a1)

I don’t have sudo access. Please see if this is good:

find / -name cuda-memcheck |& grep -v denied
/usr/local/cuda-9.0/doc/html/cuda-memcheck
/usr/local/cuda-9.0/bin/cuda-memcheck

what is the output of:

cuda-memcheck --version

?

CUDA-MEMCHECK version 9.0.176 ID:(44)

I don’t think I’ll be able to identify what is happening.

It may possibly help me if you provide the output of:

nvidia-smi -l

but I’m not optimistic.

You are welcome to file a bug report at developer.nvidia.com if you wish. You’ll need to be a registered developer. Login as a registered developer, click on your name in the upper right hand corner, then my account, then my bugs.

They may at some point ask for additional details.

nvidia-smi -l
Wed Aug 15 14:56:17 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE… Off | 00000000:00:08.0 Off | 0 |
| N/A 42C P0 35W / 250W | 0MiB / 16152MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
Wed Aug 15 14:56:22 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE… Off | 00000000:00:08.0 Off | 0 |
| N/A 42C P0 35W / 250W | 0MiB / 16152MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

I’m sorry, I specified the wrong switch. I meant

nvidia-smi -a

however, I don’t believe I will be able to help you.

Here is the output from smi. Let me know if you have any clues. Otherwise, I will go ahead and file a bug, like you suggested. Thank you for responding.

Bala.

nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Wed Aug 15 16:19:04 2018
Driver Version : 384.111

Attached GPUs : 1
GPU 00000000:00:08.0
Product Name : Tesla V100-PCIE-16GB
Product Brand : Tesla
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0323617021491
GPU UUID : GPU-c9f86f10-80fa-9488-5fff-192036bcdf96
Minor Number : 0
VBIOS Version : 88.00.1A.00.03
MultiGPU Board : No
Board ID : 0x8
GPU Part Number : 900-2G500-0000-000
Inforom Version
Image Version : G500.0200.00.03
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : Pass-Through
PCI
Bus : 0x00
Device : 0x08
Domain : 0x0000
Device Id : 0x1DB410DE
Bus Id : 00000000:00:08.0
Sub System Id : 0x121410DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
FB Memory Usage
Total : 16152 MiB
Used : 0 MiB
Free : 16152 MiB
BAR1 Memory Usage
Total : 16384 MiB
Used : 2 MiB
Free : 16382 MiB
Compute Mode : Default
Utilization
Gpu : 4 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
GPU Current Temp : 42 C
GPU Shutdown Temp : 90 C
GPU Slowdown Temp : 87 C
GPU Max Operating Temp : 83 C
Memory Current Temp : 38 C
Memory Max Operating Temp : 85 C
Power Readings
Power Management : Supported
Power Draw : 35.77 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 250.00 W
Clocks
Graphics : 1245 MHz
SM : 1245 MHz
Memory : 877 MHz
Video : 1125 MHz
Applications Clocks
Graphics : 1245 MHz
Memory : 877 MHz
Default Applications Clocks
Graphics : 1245 MHz
Memory : 877 MHz
Max Clocks
Graphics : 1380 MHz
SM : 1380 MHz
Memory : 877 MHz
Video : 1237 MHz
Max Customer Boost Clocks
Graphics : 1380 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None