Cuda memory access with cudaMallocManaged

Hi!

I am fairly new to programming with cuda and have a question. We currently are developing an API which is used in an application where we stream images from multiple devices.

The setup is, that we create a cudaMemoryRessource (similar to std::pmr) which handles all the memory allocation for all devices using cudaMallocManaged calls. My understanding was, that with cudaMallocManaged we should be able to access the memory location with host and cuda code and if we run the application in a single thread, this is the case. However when we want to grab images simultaneously using two seperate threads we run into problems with accessing the memory locations.

An example. We have a cuda class which does image manipulation and for that we allocate a memory block for a calibration struct using the cudamemoryressource:

  m_pCalibration =
    std::make_unique<memory::TypedMemoryBlock<utils::Calibration>>(pCudaMr, calibration);

This cuda class has a getter and setter method because it can happen that the calibration changes. This getter and setter are mostly called from non cuda code. Now in a single threaded application using the devices sequentially this is not a problem. However when we run them in seperate threads this leads to segmentation faults when for example accessing the calibration or even trying to copy the generated images.

Is there anything we should watch out for using cudaMallocManaged from different threads?

are you running this on windows?
are you running this on a Maxwell or Kepler device?

Oh im sorry, should have added way more information.

I run this on Nvidia Jetson Orin Nano/NX devices. And I already found out that the concurrentManagedAccess=0 which probably is not a good sign. I also found this thread, which you already answered and I think this might also be part of our problem: multithreading - Can CUDA unified memory be written to by another CPU thread? - Stack Overflow

Yes, on Jetson, managed memory sometimes/often behaves in a “pre-pascal” way (concurrentManagedAccess is false).

This may be of interest. It may be applicable if your memory usage in each thread is “independent” from other threads.

Also, you may get better help with Jetson issues by posting on a relevant Jetson forum.