Access memory of cudaMallocManaged after launch kernel will cause crash

pengxiao · November 2, 2023, 12:59pm

#include <iostream>

#define CUDA_RUNTIME_CHECK(call)                                                           \
  {                                                                                        \
    cudaError_t res = (call);                                                              \
    if (res != cudaSuccess) {                                                              \
      std::cout  << "CUDA Runtime API " << cudaGetErrorName(res) << ": "                   \
                 << cudaGetErrorString(res) << " [" << __FILE__ << ":" << __LINE__ << ']'; \
      exit(1);                                                                             \
    }                                                                                      \
  }

__global__ void kernel_empty_test() {}

int main() {
  void* brr = nullptr;
  CUDA_RUNTIME_CHECK(cudaMallocManaged(&brr, 88));

  std::cout << brr << std::endl;

  memset(brr, 0, 88);

  // Remove this line, then the program won't crash.
  kernel_empty_test<<<1, 1, 0>>>();

  const int32_t* data = static_cast<int32_t*>(brr);
  std::cout << data[0];
}

nvcc rua.cu -o rua && ./rua
0x204c80000
[1]    28250 segmentation fault (core dumped)  ./rua

Linux 5.10.104-tegra
Inside docker image: l4t-ml:r35.2.1-py3
nvcc 11.4

AastaLLL · November 3, 2023, 3:47am

Hi,

Could you try to add a synchronization call after launching the kernel?

Thanks.

pengxiao · November 3, 2023, 5:10am

Hi

It works to add a synchronization before access the data, but the program still crashes if I only add a synchronization after the memory access. Do you think it’s expected?

I wanted to launch a kernel, then do some cpu computation on unified memory, so the cpu and gpu computation could be parallel. I think the object accessed by cpu won’t be used by gpu, so I don’t need to sync and the computation could be parallel. Did I misunderstand something?

Thanks for your help!

AastaLLL · November 6, 2023, 4:53am

Hi,

Jetson doesn’t support concurrent access so you will need to make sure the GPU tasks are done before accessing with CPU.

But in your use case, the kernel is actually doing nothing so does look strange.
We need to discuss this with our internal team. Will let you know the following later.

Thanks…

AastaLLL · November 8, 2023, 7:05am

Hi,

We got some feedback from our internal team.
This issue can be fixed by attaching the buffer with cudaMemAttachHost.

@@ -15,6 +15,7 @@ __global__ void kernel_empty_test() {}
 int main() {
   void* brr = nullptr;
   CUDA_RUNTIME_CHECK(cudaMallocManaged(&brr, 88));
+  CUDA_RUNTIME_CHECK(cudaStreamAttachMemAsync(NULL, brr, 0, cudaMemAttachHost));
 
   std::cout << brr << std::endl;

Since cuda driver does not know if a GPU kernel will access the memory or not.
It has to assume all possible managed memory might be used and apply the necessary protection which causes the segmentation fault here.

Thanks.

system · December 5, 2023, 2:03am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shared memory access lead to crash when using cuda-memcheck CUDA Programming and Performance cuda	2	650	February 6, 2022
Unified memory and concurrent C++ objects Jetson TX2	10	2496	October 18, 2021
Cuda memory access with cudaMallocManaged CUDA Programming and Performance camera , cuda	3	44	September 11, 2024
Invalid Managed Memory Access Jetson TX2	2	1209	October 18, 2021
Unified Memory On TX1 Jetson TX1	4	855	October 18, 2021
Segmentation fault or bug error when use unified memory on jetson nx Jetson Nano cuda	4	458	April 12, 2023
Managed Memory Access crash on Tegra CUDA Programming and Performance cuda , jetson	2	989	March 9, 2022
Unified Memory Access using Jetson TX2 Jetson TX2	5	2320	October 18, 2021
cudaMallocManaged with cudaMemAttachHost Jetson AGX Orin cuda	2	516	October 13, 2022
Kernel invocation invalidates unified memory blocks CUDA Programming and Performance	9	1067	January 8, 2018

Access memory of cudaMallocManaged after launch kernel will cause crash

Related topics