Hi teams,
I hope you are doing well.
I have a Linux VM running with a A40 VGPU :
Mon Jun 2 23:40:12 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A40-48Q On | 00000000:00:05.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 24MiB / 49152MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1633 G /usr/lib/xorg/Xorg 23MiB |
+-----------------------------------------------------------------------------------------+
In this VM, I was trying to run the latest vllm with nccl 2.6.2, however I ran into the below errors:
ERROR 06-02 05:11:29 [worker_base.py:620] raise RuntimeError(f"NCCL error: {error_str}")
ERROR 06-02 05:11:29 [worker_base.py:620] RuntimeError: NCCL error: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
70acd25d92c0:99505:99505 [0] NCCL INFO Bootstrap: Using eth0:10.89.0.2<0>
70acd25d92c0:99505:99505 [0] NCCL INFO cudaDriverVersion 12080
70acd25d92c0:99505:99505 [0] NCCL INFO NCCL version 2.26.2+cuda12.2
70acd25d92c0:99505:99505 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal net plugin.
70acd25d92c0:99505:99505 [0] NCCL INFO Failed to open libibverbs.so[.1]
70acd25d92c0:99505:99505 [0] NCCL INFO NET/Socket : Using [0]eth0:10.89.0.2<0>
70acd25d92c0:99505:99505 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
70acd25d92c0:99505:99505 [0] NCCL INFO Using network Socket
[2025-06-02 05:11:29] 70acd25d92c0:99505:99505 [0] init.cc:416 NCCL WARN Cuda failure 'operation not supported'
70acd25d92c0:99505:99505 [0] NCCL INFO init.cc:1397 -> 1
70acd25d92c0:99505:99505 [0] NCCL INFO init.cc:1704 -> 1
70acd25d92c0:99505:99505 [0] NCCL INFO init.cc:1730 -> 1
The related code in nccl init.cc:416 is:
It seems the cudaMemPoolCreateis not supported in my environment.
I ask gpt to write a test script for me to check if Stream-Ordered Memory Allocator is supported in my environment:
#include <cuda_runtime_api.h>
#include <iostream>
void check(int dev) {
int val = 0;
// Memory pool supported?
cudaDeviceGetAttribute(&val, cudaDevAttrMemoryPoolsSupported, dev);
std::cout << "GPU " << dev << ":\n";
std::cout << " cudaDevAttrMemoryPoolsSupported: " << val << std::endl;
// Compute capability
int major=0, minor=0;
cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, dev);
cudaDeviceGetAttribute(&minor, cudaDevAttrComputeCapabilityMinor, dev);
std::cout << " Compute Capability: " << major << "." << minor << std::endl;
// Host-pinned mempool test
cudaMemPool_t pool;
cudaMemPoolProps props{};
props.allocType = cudaMemAllocationTypePinned;
props.handleTypes = cudaMemHandleTypeNone;
props.location.type = cudaMemLocationTypeDevice;
props.location.id = dev;
cudaError_t err = cudaMemPoolCreate(&pool, &props);
std::cout << " cudaMemPoolCreate → " << cudaGetErrorString(err) << "\n";
if (err == cudaSuccess) cudaMemPoolDestroy(pool);
}
int main() {
// CUDA versions
int runtime_version = 0;
cudaRuntimeGetVersion(&runtime_version);
std::cout << "CUDA Runtime Version: " << runtime_version / 1000
<< "." << (runtime_version % 1000) / 10 << std::endl;
int driver_version = 0;
cudaDriverGetVersion(&driver_version);
std::cout << "CUDA Driver Version: " << driver_version / 1000
<< "." << (driver_version % 1000) / 10 << std::endl;
// Check each device
int n = 0;
cudaGetDeviceCount(&n);
std::cout << "Number of CUDA devices: " << n << std::endl;
for (int d = 0; d < n; ++d) check(d);
return 0;
}
The output is:
CUDA Runtime Version: 12.8
CUDA Driver Version: 12.8
Number of CUDA devices: 1
GPU 0:
cudaDevAttrMemoryPoolsSupported: 0
Compute Capability: 8.6
cudaMemPoolCreate → operation not supported
I have browsed through the forum and it seems only windows could run into this issue and A40 with Linux should be working fine…
I also tested this script on another VM with my cloud provider with L40s attached. And it produce the same “unsuppported” results. I guess it could be something related to my VM environment.
Below is some extra information about my environment:
nvidia-smi -q
==============NVSMI LOG==============
Timestamp : Mon Jun 2 23:33:42 2025
Driver Version : 570.124.06
CUDA Version : 12.8
Attached GPUs : 1
GPU 00000000:00:05.0
Product Name : NVIDIA A40-48Q
Product Brand : NVIDIA RTX Virtual Workstation
Product Architecture : Ampere
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-03e27f3c-3f5a-11f0-8c28-9e09393afd88
Minor Number : 0
VBIOS Version : 00.00.00.00.00
MultiGPU Board : No
Board ID : 0x5
Board Part Number : N/A
GPU Part Number : 2235-895-A1
FRU Part Number : N/A
Platform Info
Chassis Serial Number : N/A
Slot Number : N/A
Tray Index : N/A
Host ID : N/A
Peer Type : N/A
Module Id : N/A
GPU Fabric GUID : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : N/A
Latest Duration : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU C2C Mode : N/A
GPU Virtualization Mode
Virtualization Mode : VGPU
Host VGPU Mode : N/A
vGPU Heterogeneous Mode : N/A
vGPU Software Licensed Product
Product Name : NVIDIA RTX Virtual Workstation
License Status : Licensed (Expiry: 2025-6-2 20:57:32 GMT)
GPU Reset Status
Reset Required : Requested functionality has been deprecated
Drain and Reset Recommended : Requested functionality has been deprecated
GPU Recovery Action : None
GSP Firmware Version : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x00
Device : 0x05
Domain : 0x0000
Base Classcode : 0x3
Sub Classcode : 0x0
Device Id : 0x223510DE
Bus Id : 00000000:00:05.0
Sub System Id : 0x14E010DE
GPU Link Info
PCIe Generation
Max : N/A
Current : N/A
Device Current : N/A
Device Max : N/A
Host Max : N/A
Link Width
Max : N/A
Current : N/A
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : N/A
Replay Number Rollovers : N/A
Tx Throughput : N/A
Rx Throughput : N/A
Atomic Caps Outbound : N/A
Atomic Caps Inbound : N/A
Fan Speed : N/A
Performance State : P8
Clocks Event Reasons : N/A
Sparse Operation Mode : N/A
FB Memory Usage
Total : 49152 MiB
Reserved : 3984 MiB
Used : 24 MiB
Free : 45145 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 0 MiB
Free : 256 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
GPU : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : N/A
OFA : N/A
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
DRAM Encryption Mode
Current : Disabled
Pending : Disabled
ECC Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable Parity : N/A
SRAM Uncorrectable SEC-DED : N/A
DRAM Correctable : 0
DRAM Uncorrectable : 0
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable Parity : N/A
SRAM Uncorrectable SEC-DED : N/A
DRAM Correctable : 0
DRAM Uncorrectable : 0
SRAM Threshold Exceeded : N/A
Aggregate Uncorrectable SRAM Sources
SRAM L2 : N/A
SRAM SM : N/A
SRAM Microcontroller : N/A
SRAM PCIE : N/A
SRAM Other : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : N/A
GPU T.Limit Temp : N/A
GPU Shutdown Temp : N/A
GPU Slowdown Temp : N/A
GPU Max Operating Temp : N/A
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
GPU Power Readings
Average Power Draw : N/A
Instantaneous Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
GPU Memory Power Readings
Average Power Draw : N/A
Instantaneous Power Draw : N/A
Module Power Readings
Average Power Draw : N/A
Instantaneous Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Power Smoothing : N/A
Workload Power Profiles
Requested Profiles : N/A
Enforced Profiles : N/A
Clocks
Graphics : 210 MHz
SM : 210 MHz
Memory : 405 MHz
Video : 555 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : N/A
SM : N/A
Memory : N/A
Video : N/A
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Fabric
State : N/A
Status : N/A
CliqueId : N/A
ClusterUUID : N/A
Health
Bandwidth : N/A
Route Recovery in progress : N/A
Route Unhealthy : N/A
Access Timeout Recovery : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 1633
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 23 MiB
Capabilities
EGM : disabled
Thank you so much for the assistance.
