cuMemCrreate only works with vector objects

woosungkang · August 22, 2023, 6:13am

I’m working with the cuMemCreate functions and found out some weird, probably bug, situation.

When i try to allocate the device memory by cuMemCreate, it only works when the vector object is defined in the program.

I will attach full codes

– main –

#include <iostream>
#include <iomanip>
#include <chrono>
#include <thread>
#include <assert.h>
#include <vector>

#include <cuda.h>
#include <cuda_runtime.h>
static inline void
checkDrvError(CUresult res, const char *tok, const char *file, unsigned line)
{
    if (res != CUDA_SUCCESS) {
        const char *errStr = NULL;
        (void)cuGetErrorString(res, &errStr);
        std::cerr << file << ':' << line << ' ' << tok
                  << "failed (" << (unsigned)res << "): " << errStr << std::endl;
    }
}

#define CHECK_DRV(x) checkDrvError(x, #x, __FILE__, __LINE__);

int main()
{
    std::vector<int> test;

    size_t free;
    typedef unsigned char ElemType;
    CUcontext ctx;
    CUdevice dev;
    int supportsVMM = 0;
    CHECK_DRV(cuInit(0));
    CHECK_DRV(cuDevicePrimaryCtxRetain(&ctx, 0));
    CHECK_DRV(cuCtxSetCurrent(ctx));
    CHECK_DRV(cuCtxGetDevice(&dev));

    CHECK_DRV(cuDeviceGetAttribute(&supportsVMM, CU_DEVICE_ATTRIBUTE_VIRTUAL_ADDRESS_MANAGEMENT_SUPPORTED, dev));
    fprintf(stderr, "SupportsVMM: %d\n", supportsVMM);

    CUresult status = CUDA_SUCCESS;
    cudaError_t error = cudaSuccess;
    CUmemAllocationProp prop;
    prop.type = CU_MEM_ALLOCATION_TYPE_PINNED;
    prop.location.type = CU_MEM_LOCATION_TYPE_DEVICE;
    prop.location.id = (int)dev;
    prop.win32HandleMetaData = NULL;
    
    CUmemAccessDesc accessDesc;
    accessDesc.location = prop.location;
    accessDesc.flags = CU_MEM_ACCESS_FLAGS_PROT_READWRITE;

    size_t chunk_sz;
    cuMemGetAllocationGranularity(&chunk_sz, &prop, CU_MEM_ALLOC_GRANULARITY_MINIMUM);
    assert(status == CUDA_SUCCESS);
    size_t size = 4*1024;
    const size_t aligned_size = ((size + chunk_sz  -1)/chunk_sz) * chunk_sz;
    
    CUmemGenericAllocationHandle handle;

    CUdeviceptr new_ptr = 0ULL;
    status = cuMemAddressReserve(&new_ptr, (aligned_size), 0ULL, 0ULL, 0ULL);
    assert(status == CUDA_SUCCESS);

    status = cuMemCreate(&handle, aligned_size, &prop, 0);
    assert(status == CUDA_SUCCESS);

    status = cuMemMap(new_ptr, aligned_size, 0, handle, 0);
    assert(status == CUDA_SUCCESS);

    status = cuMemSetAccess(new_ptr, aligned_size, &accessDesc, 1ULL);
    assert(status == CUDA_SUCCESS);

    float * dev_ptr = (float *)new_ptr;
    
    error = cudaMemset(dev_ptr, 1, aligned_size);
    assert(error == cudaSuccess);

}

– makefile –

NVCC ?= nvcc

all: vmm_main 

vmm_main: vmm_main.cpp 
	$(NVCC) $^ -o $@ -lcuda -std=c++11

clean:
	$(RM) vmm_main

My system is CUDA 12.1 with RTX 3090 Ubuntu 18.04.

If i just comment the first line " std::vector test",
program ends right after the cuMemCreate with return 1.

Did i do something wrong? or is it bug?

thx

Robert_Crovella · August 22, 2023, 12:28pm

I don’t have any trouble running your code on a L4 GPU with CUDA 12.2.1. It makes it all the way to the end, whether I have the std::vector line commented or not.

Cory.Perry · August 22, 2023, 4:50pm

Hi @woosungkang,

The return value of “1” is CUDA_ERROR_INVALID_VALUE, which means one of the arguments to cuMemCreate is incorrect. You can find all the error values and their meaning here:
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html#group__CUDA__TYPES_1gc6c391505e117393cc2558fff6bfc2e9

As to why you’re getting CUDA_ERROR_INVALID_VALUE, based on the fact the issue goes away if you add a stack variable “std::vector test”, my guess is CUmemAllocationProp is not being zero initialized. Try adding a “memset(&prop, 0, sizeof(prop))” to your code just before setting all the property attributes and see if that helps. Likely Robert’s compiler properly aligns the structure where this isn’t a problem for him. Always make sure to zero initialize any passed structures passed to a CUDA API whenever possible.

Hope this helps!

woosungkang · August 23, 2023, 6:40am

Ohhh thanks!!!
memset advise worked!!!

Topic		Replies	Views
cuStreamWriteValue32 doesn't seem to work with memory allocated via cuMemCreate? CUDA Programming and Performance cuda , driver	0	76	October 19, 2024
cudaMemsetParams::dst in cudaGraphAddMemsetNode can not use cuMemCreate and cuMemAddressReserve to create virtual address CUDA Programming and Performance cuda	3	75	July 22, 2025
cuMemcreate produce NVMAP_IOC_GET_FD failed: Bad address error CUDA Programming and Performance cuda , jetson	1	179	September 4, 2024
GB200 vs H200 NVL: cuMemCreate(1 GiB) is ~80–90 ms vs ~0.08–0.13 ms — expected on GB200? CUDA Programming and Performance cuda , driver , api	2	215	October 19, 2025
cuMemAlloc() How to use in __device__ CUDA Programming and Performance	8	7310	June 29, 2008
cuMemSetAccess fails with CUDA_ERROR_OUT_OF_MEMORY for device 1 but succeeds for device 0 CUDA Programming and Performance cuda	4	185	July 1, 2025
Using CUDA virtual memory API for host allocation CUDA Programming and Performance	9	419	October 29, 2025
Introducing Low-Level GPU Virtual Memory Management Technical Blog	59	9099	June 4, 2024
Why cuCtxCreate fails ? Return code is CUDA_ERROR_OUT_OF_MEMORY CUDA Programming and Performance	3	4926	October 1, 2008
Does cuMemCreate function create contiguous physical memory location? CUDA Programming and Performance cuda	2	544	June 23, 2023

cuMemCrreate only works with vector objects

Related topics