Stream-Ordered Memory Allocator is not supported on Linux A40 VGPU

Hi teams,

I hope you are doing well.

I have a Linux VM running with a A40 VGPU :

Mon Jun  2 23:40:12 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 570.124.06     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40-48Q                 On  |   00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8            N/A  /  N/A  |      24MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1633      G   /usr/lib/xorg/Xorg                       23MiB |
+-----------------------------------------------------------------------------------------+

In this VM, I was trying to run the latest vllm with nccl 2.6.2, however I ran into the below errors:

ERROR 06-02 05:11:29 [worker_base.py:620]     raise RuntimeError(f"NCCL error: {error_str}")
ERROR 06-02 05:11:29 [worker_base.py:620] RuntimeError: NCCL error: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
70acd25d92c0:99505:99505 [0] NCCL INFO Bootstrap: Using eth0:10.89.0.2<0>
70acd25d92c0:99505:99505 [0] NCCL INFO cudaDriverVersion 12080
70acd25d92c0:99505:99505 [0] NCCL INFO NCCL version 2.26.2+cuda12.2
70acd25d92c0:99505:99505 [0] NCCL INFO NET/Plugin: Could not find: libnccl-net.so. Using internal net plugin.
70acd25d92c0:99505:99505 [0] NCCL INFO Failed to open libibverbs.so[.1]
70acd25d92c0:99505:99505 [0] NCCL INFO NET/Socket : Using [0]eth0:10.89.0.2<0>
70acd25d92c0:99505:99505 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so. 
70acd25d92c0:99505:99505 [0] NCCL INFO Using network Socket

[2025-06-02 05:11:29] 70acd25d92c0:99505:99505 [0] init.cc:416 NCCL WARN Cuda failure 'operation not supported'
70acd25d92c0:99505:99505 [0] NCCL INFO init.cc:1397 -> 1
70acd25d92c0:99505:99505 [0] NCCL INFO init.cc:1704 -> 1
70acd25d92c0:99505:99505 [0] NCCL INFO init.cc:1730 -> 1

The related code in nccl init.cc:416 is:

It seems the cudaMemPoolCreateis not supported in my environment.

I ask gpt to write a test script for me to check if Stream-Ordered Memory Allocator is supported in my environment:

#include <cuda_runtime_api.h>
#include <iostream>

void check(int dev) {
    int val = 0;

    // Memory pool supported?
    cudaDeviceGetAttribute(&val, cudaDevAttrMemoryPoolsSupported, dev);
    std::cout << "GPU " << dev << ":\n";
    std::cout << "  cudaDevAttrMemoryPoolsSupported: " << val << std::endl;

    // Compute capability
    int major=0, minor=0;
    cudaDeviceGetAttribute(&major, cudaDevAttrComputeCapabilityMajor, dev);
    cudaDeviceGetAttribute(&minor, cudaDevAttrComputeCapabilityMinor, dev);
    std::cout << "  Compute Capability: " << major << "." << minor << std::endl;

    // Host-pinned mempool test
    cudaMemPool_t pool;
    cudaMemPoolProps props{};
    props.allocType       = cudaMemAllocationTypePinned;
    props.handleTypes     = cudaMemHandleTypeNone;
    props.location.type   = cudaMemLocationTypeDevice;
    props.location.id     = dev;

    cudaError_t err = cudaMemPoolCreate(&pool, &props);
    std::cout << "  cudaMemPoolCreate → " << cudaGetErrorString(err) << "\n";
    if (err == cudaSuccess) cudaMemPoolDestroy(pool);
}

int main() {
    // CUDA versions
    int runtime_version = 0;
    cudaRuntimeGetVersion(&runtime_version);
    std::cout << "CUDA Runtime Version: " << runtime_version / 1000
              << "." << (runtime_version % 1000) / 10 << std::endl;

    int driver_version = 0;
    cudaDriverGetVersion(&driver_version);
    std::cout << "CUDA Driver Version: " << driver_version / 1000
              << "." << (driver_version % 1000) / 10 << std::endl;

    // Check each device
    int n = 0;
    cudaGetDeviceCount(&n);
    std::cout << "Number of CUDA devices: " << n << std::endl;

    for (int d = 0; d < n; ++d) check(d);

    return 0;
}

The output is:

CUDA Runtime Version: 12.8
CUDA Driver Version: 12.8
Number of CUDA devices: 1
GPU 0:
  cudaDevAttrMemoryPoolsSupported: 0
  Compute Capability: 8.6
  cudaMemPoolCreate → operation not supported

I have browsed through the forum and it seems only windows could run into this issue and A40 with Linux should be working fine…

I also tested this script on another VM with my cloud provider with L40s attached. And it produce the same “unsuppported” results. I guess it could be something related to my VM environment.

Below is some extra information about my environment:
nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Mon Jun  2 23:33:42 2025
Driver Version                            : 570.124.06
CUDA Version                              : 12.8

Attached GPUs                             : 1
GPU 00000000:00:05.0
    Product Name                          : NVIDIA A40-48Q
    Product Brand                         : NVIDIA RTX Virtual Workstation
    Product Architecture                  : Ampere
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    Addressing Mode                       : None
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-03e27f3c-3f5a-11f0-8c28-9e09393afd88
    Minor Number                          : 0
    VBIOS Version                         : 00.00.00.00.00
    MultiGPU Board                        : No
    Board ID                              : 0x5
    Board Part Number                     : N/A
    GPU Part Number                       : 2235-895-A1
    FRU Part Number                       : N/A
    Platform Info
        Chassis Serial Number             : N/A
        Slot Number                       : N/A
        Tray Index                        : N/A
        Host ID                           : N/A
        Peer Type                         : N/A
        Module Id                         : N/A
        GPU Fabric GUID                   : N/A
    Inforom Version
        Image Version                     : N/A
        OEM Object                        : N/A
        ECC Object                        : N/A
        Power Management Object           : N/A
    Inforom BBX Object Flush
        Latest Timestamp                  : N/A
        Latest Duration                   : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU C2C Mode                          : N/A
    GPU Virtualization Mode
        Virtualization Mode               : VGPU
        Host VGPU Mode                    : N/A
        vGPU Heterogeneous Mode           : N/A
    vGPU Software Licensed Product
        Product Name                      : NVIDIA RTX Virtual Workstation
        License Status                    : Licensed (Expiry: 2025-6-2 20:57:32 GMT)
    GPU Reset Status
        Reset Required                    : Requested functionality has been deprecated
        Drain and Reset Recommended       : Requested functionality has been deprecated
    GPU Recovery Action                   : None
    GSP Firmware Version                  : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x00
        Device                            : 0x05
        Domain                            : 0x0000
        Base Classcode                    : 0x3
        Sub Classcode                     : 0x0
        Device Id                         : 0x223510DE
        Bus Id                            : 00000000:00:05.0
        Sub System Id                     : 0x14E010DE
        GPU Link Info
            PCIe Generation
                Max                       : N/A
                Current                   : N/A
                Device Current            : N/A
                Device Max                : N/A
                Host Max                  : N/A
            Link Width
                Max                       : N/A
                Current                   : N/A
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : N/A
        Replay Number Rollovers           : N/A
        Tx Throughput                     : N/A
        Rx Throughput                     : N/A
        Atomic Caps Outbound              : N/A
        Atomic Caps Inbound               : N/A
    Fan Speed                             : N/A
    Performance State                     : P8
    Clocks Event Reasons                  : N/A
    Sparse Operation Mode                 : N/A
    FB Memory Usage
        Total                             : 49152 MiB
        Reserved                          : 3984 MiB
        Used                              : 24 MiB
        Free                              : 45145 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 0 MiB
        Free                              : 256 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB
    Compute Mode                          : Default
    Utilization
        GPU                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
        JPEG                              : N/A
        OFA                               : N/A
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    DRAM Encryption Mode
        Current                           : Disabled
        Pending                           : Disabled
    ECC Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable Parity     : N/A
            SRAM Uncorrectable SEC-DED    : N/A
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable Parity     : N/A
            SRAM Uncorrectable SEC-DED    : N/A
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
            SRAM Threshold Exceeded       : N/A
        Aggregate Uncorrectable SRAM Sources
            SRAM L2                       : N/A
            SRAM SM                       : N/A
            SRAM Microcontroller          : N/A
            SRAM PCIE                     : N/A
            SRAM Other                    : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : N/A
        GPU T.Limit Temp                  : N/A
        GPU Shutdown Temp                 : N/A
        GPU Slowdown Temp                 : N/A
        GPU Max Operating Temp            : N/A
        GPU Target Temperature            : N/A
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    GPU Power Readings
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    GPU Memory Power Readings
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
    Module Power Readings
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Power Smoothing                       : N/A
    Workload Power Profiles
        Requested Profiles                : N/A
        Enforced Profiles                 : N/A
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : N/A
        SM                                : N/A
        Memory                            : N/A
        Video                             : N/A
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Fabric
        State                             : N/A
        Status                            : N/A
        CliqueId                          : N/A
        ClusterUUID                       : N/A
        Health
            Bandwidth                     : N/A
            Route Recovery in progress    : N/A
            Route Unhealthy               : N/A
            Access Timeout Recovery       : N/A
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 1633
            Type                          : G
            Name                          : /usr/lib/xorg/Xorg
            Used GPU Memory               : 23 MiB
    Capabilities
        EGM                               : disabled

Thank you so much for the assistance.

I think it is probably the VM usage. VGPU setups have various limitations compared to bare metal. Since you have a VGPU setup, you may also have a support entitlement. You could check with NVIDIA Enterprise support.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.