VPI Array wrap invalidate managed memory device side

Hello,

I’m having some issues using managed memory with VPI: after wrapping the allocated managed memory, I think that Cuda sees the memory as host only. It return error as “invalid argument”

I would like to understand what is going on and I have prepared a small sample to reproduce the issue
sample.zip (2.5 KB)

Some version info: Jetpack v4.6, CUDA v10.2, VPI v1.1

Hi,

Thanks for reporting this.
We are able to reproduce this error in our environment and are now checking the details.

Please noted that we also have some newer VPI releases:

  • VPI v1.2 in JetPack 4.6
  • VPI v2.0 in JetPack 5.0

It’s recommended to upgrade to the latest version for a better experience.
Thanks.

Hi @AastaLLL

I tried with vpi v1.2.3 and the error remain

Best.

Hi,

Thanks for your testing.

FYI, The same issue also occurs with our latest VPI 2.0.
We are checking this with our internal team now. Will share more information once we got feedback.

Thanks.

Hi @AastaLLL,

I was able to investigate further and the memory need to be reattached to global context or to the stream; tested with both sync and async copy, that seems to fix the issue; I can’t tell if then other VPI operation fails or unwanted copies are done from VPI; I might update this issue eventually.

I got a different issue that I forgot to mention, as you can see from the sample I try to avoid set the sizePointer of the VPIArrayData, because then the wrapping operation fails, is there any particular reason why that happen?
It is not a big issue but it would be nice to directly check the size without calls to the getter or lock-unlock the array.

    void* data_ptr;
    VPIArray array;
    
    // stuff
    const auto capacity = 1000;
    const auto strideBytes  = sizeof(VPIKeypoint);
    const auto total_size = capacity*strideBytes;
    int32_t size = 0;
    cudaMallocManaged(&data_ptr, total_size, cudaMemAttachGlobal)

    {
        VPIArrayData arr_data = {};
        arr_data.capacity = capacity;
        arr_data.data = data_ptr;
        arr_data.sizePointer = &size; // this fails
        arr_data.type = VPI_ARRAY_TYPE_KEYPOINT;
        arr_data.strideBytes = strideBytes;
        auto err = vpiArrayCreateCUDAMemWrapper(&arr_data, 0, &array);
        assert(err == VPI_SUCCESS);
    }

Thanks.

Hi,

We also observe something similar.

After wrapping from the VPI, the buffer becomes CPU-accessible only.
A GPU access usually leads to an error or exception.

On Jetson, we don’t support concurrent access for a unified memory.
This indicates only one process (either CPU or GPU) can access the buffer per time.

We are discussing this issue with our internal team.
I will update you here with more information later.

Thanks.

Hi,

We have confirmed with our internal team.
Reattaching the buffer is the correct solution in your use case.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.