cuMemCrreate only works with vector objects

Hello, @Cory.Perry

I’m working with the cuMemCreate functions and found out some weird, probably bug, situation.

When i try to allocate the device memory by cuMemCreate, it only works when the vector object is defined in the program.

I will attach full codes

– main –

#include <iostream>
#include <iomanip>
#include <chrono>
#include <thread>
#include <assert.h>
#include <vector>

#include <cuda.h>
#include <cuda_runtime.h>
static inline void
checkDrvError(CUresult res, const char *tok, const char *file, unsigned line)
    if (res != CUDA_SUCCESS) {
        const char *errStr = NULL;
        (void)cuGetErrorString(res, &errStr);
        std::cerr << file << ':' << line << ' ' << tok
                  << "failed (" << (unsigned)res << "): " << errStr << std::endl;

#define CHECK_DRV(x) checkDrvError(x, #x, __FILE__, __LINE__);

int main()
    std::vector<int> test;

    size_t free;
    typedef unsigned char ElemType;
    CUcontext ctx;
    CUdevice dev;
    int supportsVMM = 0;
    CHECK_DRV(cuDevicePrimaryCtxRetain(&ctx, 0));

    fprintf(stderr, "SupportsVMM: %d\n", supportsVMM);

    CUresult status = CUDA_SUCCESS;
    cudaError_t error = cudaSuccess;
    CUmemAllocationProp prop;
    prop.location.type = CU_MEM_LOCATION_TYPE_DEVICE; = (int)dev;
    prop.win32HandleMetaData = NULL;
    CUmemAccessDesc accessDesc;
    accessDesc.location = prop.location;

    size_t chunk_sz;
    cuMemGetAllocationGranularity(&chunk_sz, &prop, CU_MEM_ALLOC_GRANULARITY_MINIMUM);
    assert(status == CUDA_SUCCESS);
    size_t size = 4*1024;
    const size_t aligned_size = ((size + chunk_sz  -1)/chunk_sz) * chunk_sz;
    CUmemGenericAllocationHandle handle;

    CUdeviceptr new_ptr = 0ULL;
    status = cuMemAddressReserve(&new_ptr, (aligned_size), 0ULL, 0ULL, 0ULL);
    assert(status == CUDA_SUCCESS);

    status = cuMemCreate(&handle, aligned_size, &prop, 0);
    assert(status == CUDA_SUCCESS);

    status = cuMemMap(new_ptr, aligned_size, 0, handle, 0);
    assert(status == CUDA_SUCCESS);

    status = cuMemSetAccess(new_ptr, aligned_size, &accessDesc, 1ULL);
    assert(status == CUDA_SUCCESS);

    float * dev_ptr = (float *)new_ptr;
    error = cudaMemset(dev_ptr, 1, aligned_size);
    assert(error == cudaSuccess);


– makefile –

NVCC ?= nvcc

all: vmm_main 

vmm_main: vmm_main.cpp 
	$(NVCC) $^ -o $@ -lcuda -std=c++11

	$(RM) vmm_main

My system is CUDA 12.1 with RTX 3090 Ubuntu 18.04.

If i just comment the first line " std::vector test",
program ends right after the cuMemCreate with return 1.

Did i do something wrong? or is it bug?


I don’t have any trouble running your code on a L4 GPU with CUDA 12.2.1. It makes it all the way to the end, whether I have the std::vector line commented or not.

Hi @woosungkang,

The return value of “1” is CUDA_ERROR_INVALID_VALUE, which means one of the arguments to cuMemCreate is incorrect. You can find all the error values and their meaning here:

As to why you’re getting CUDA_ERROR_INVALID_VALUE, based on the fact the issue goes away if you add a stack variable “std::vector test”, my guess is CUmemAllocationProp is not being zero initialized. Try adding a “memset(&prop, 0, sizeof(prop))” to your code just before setting all the property attributes and see if that helps. Likely Robert’s compiler properly aligns the structure where this isn’t a problem for him. Always make sure to zero initialize any passed structures passed to a CUDA API whenever possible.

Hope this helps!

1 Like

Ohhh thanks!!!
memset advise worked!!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.