Why can't get expected result from __nv_aligned_device_malloc?

josephblue · February 16, 2023, 9:49am

Hi folks,

I’m learning the samples on programming guide, and wrote a simple kernel to verify the function in title allocated aligned addresses. But I can’t get what I wanted.
global void kernel_aligned_malloc(uint8_t ptr, size_t size, size_t align, int loop) {
uint8_t dptr;
auto is_aligned = [&](uint8_t p) {return ((uint64_t)(p) & (align - 1)) == 0;};
for (int l = 1; l <= loop; l ++) {
dptr = (uint8_t)__nv_aligned_device_malloc(size, align);
memset(dptr, (l & 0xff), size);
if (!is_aligned(dptr)) {
ptr[0] = uint8_t((uint64_t)(dptr) & 0xff);
free(dptr);
return;
}
memcpy(ptr, dptr, size);
free(dptr);
}
}
Then I used the cuda-memcheck, it said the below errors. Did I do something wrong?
========= Malloc/Free error encountered : Double free
========= at 0x00000d40 in __cuda_syscall_mc_dyn_globallock_free
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x7ff8b2dff920

========= Program hit cudaErrorLaunchFailure (error 719) due to “unspecified launch failure” on CUDA API call to cudaMemcpy.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x34fb13]
========= Host Frame:./build/test_heap_memory_aligned_malloc [0x9a845]
========= Host Frame:./build/test_heap_memory_aligned_malloc [0xeb45]
========= Host Frame:./build/test_heap_memory_aligned_malloc [0xdf07]
========= Host Frame:./build/test_heap_memory_aligned_malloc [0xe0a0]
========= Host Frame:./build/test_heap_memory_aligned_malloc [0x3506c]
========= Host Frame:./build/test_heap_memory_aligned_malloc [0x2fe21]
========= Host Frame:./build/test_heap_memory_aligned_malloc [0x14fd6]

striker159 · February 16, 2023, 12:56pm

please show the full code.

josephblue · February 17, 2023, 6:22am

Thanks man. Btw I ran it another time today and it got passed :)

#include <iostream>
#include <stdint.h>
using namespace std;
__global__ void kernel_aligned_malloc(uint8_t *ptr, size_t size, size_t align, int loop) {
  uint8_t *dptr;
  auto is_aligned = [&](uint8_t *p) {return ((uint64_t)(p) & (align - 1)) == 0;};
  for (int l = 1; l <= loop; l ++) {
    dptr = (uint8_t*)__nv_aligned_device_malloc(size, align);
    memset(dptr, (l & 0xff), size);
    if (!is_aligned(dptr)) {
      ptr[0] = uint8_t((uint64_t)(dptr) & 0xff);
      free(dptr);
      return;
    }
    memcpy(ptr, dptr, size);
    free(dptr);
  }
}

int main() {
  uint8_t *dp;
  cudaMalloc((void**)&dp, sizeof(uint8_t));
  kernel_aligned_malloc<<<1, 1>>>(dp, 16, 16, 2);
  uint8_t h;
  cudaMemcpy(&h, dp, sizeof(uint8_t), cudaMemcpyDeviceToHost);
  cout << "Result = " << (uint32_t)h << endl;
  return 0;
}

Robert_Crovella · February 17, 2023, 3:49pm

You have illegal behavior here:

memcpy(ptr, dptr, size);

You are copying 16 bytes to ptr but you have only allocated 1 byte:

cudaMalloc((void**)&dp, sizeof(uint8_t));

When I fix that issue, your code runs with no runtime errors for me, on a cc 7.5 device on CUDA 12.0

If you’re still having trouble after fixing that issue, my first suggestion is to update your CUDA install to 12.0. If you still observe problems after updating to CUDA 12.0, please identify the actual GPU you are running this on, and the compile command line you are using.

josephblue · February 18, 2023, 12:52am

Great! Sorry I missed that! Thanks so much !

Topic		Replies	Views
Weird malloc problem CUDA Programming and Performance	2	2401	August 8, 2009
Accessing GPU global memory allocated on device - by host CUDA Programming and Performance	3	1192	June 3, 2013
CUDA 4.0: linux malloc for page-aligned memory and cudaHostRegister How to malloc page-aligned memor CUDA Programming and Performance	9	19333	March 11, 2011
Unaligned memory access not supported -- driving me batty! This error appears to have nothing to CUDA Programming and Performance	8	9246	April 24, 2011
cudaFree is returning an unrecognised error code CUDA Programming and Performance	10	7946	March 13, 2009
Cuda malloc error CUDA Programming and Performance	1	668	May 5, 2014
cudaMalloc, cudaFree address CUDA-MEMCHECK	1	926	November 15, 2021
cudaMemcpy to device allocated memory (via malloc) fails with CUDA Programming and Performance	1	569	June 25, 2021
cuda passing user defined structure to a kernel failed CUDA Programming and Performance	3	1195	January 26, 2015
Problem with cudaMalloc CUDA Programming and Performance	4	10102	October 29, 2008

Why can't get expected result from __nv_aligned_device_malloc?

Related topics