PVA/VIC with Unified Memory in VPI

wsmlby · October 27, 2020, 5:37am

How does PVA work with Unified Memory in VPI? Can unified memory be wrapped with vpiImageCreateCudaMemWrapper or vpiImageCreateHostMemWrapper on AGX Xavier? What is the expected behavior of each? Does the CPU access limit(no CPU access when GPU kernel is running) apply to them?

There seems to be no documents around how PVA works with Unified Memory.

A little bit more, does VPI works with UnifiedMemory when using CUDA/CPU the same way as UnifiedMemory is used directly by CUDA/CPU? Does the same limit apply?

Thanks.

AastaLLL · October 27, 2020, 7:45am

Hi,

Yes, it should work.
It’s recommended to wrap the buffer with vpiImageCreateCudaMemWrapper so you don’t need to do host->device mapping again.

Please noted that the VPI doesn’t support concurrent access.
This is a hardware limitation from Jetson device.

The detailed procedure is similar to cuda buffer in this sample:

github.com

AastaNV/00-video_stabilization/blob/master/main.cpp#L107


      
          
          
// prepare VPI Array
          CHECK_STATUS(vpiArrayCreate(8192, VPI_ARRAY_TYPE_U32, 0, &scores));
          {
              VPIArrayData kpData;
              kpData.capacity = 8192;
              kpData.size     = 0;
              kpData.stride   = sizeof(VPIKeypoint);
              kpData.type     = VPI_ARRAY_TYPE_KEYPOINT;
          
          
    cudaMalloc( (void**)&kpts_buf, kpData.stride*kpData.capacity);
              kpData.data = kpts_buf;
              CHECK_STATUS(vpiArrayWrapCudaDeviceMem(&kpData, 0, &keypoints));
          
          

          
    kpData.capacity = 128;
              kpData.size     = 0;
              kpData.stride   = sizeof(VPIKLTTrackedBoundingBox);
              kpData.type     = VPI_ARRAY_TYPE_KLT_TRACKED_BOUNDING_BOX;
          
          
    cudaMalloc( (void**)&input_box_buf, kpData.stride*kpData.capacity);

Thanks.