cudaHostAlloc is returning bad memory After calling cudaHostAlloc, my host pointer is out-of-bounds

Summary: My program uses cudaHostAlloc() to allocate mapped memory. It succeeds with no errors, but the memory location it returns is actually out of bounds.


I am calling cudaHostAlloc from the following wrapper function:

void hostAlloc(void **dPointer, void **hPointer,

						  size_t size, unsigned int cuda_flags,

						  double* remaining_mem)


  if (*remaining_mem >= size) {

	cudaHostAlloc(hPointer, size, cuda_flags);

	check_error(cudaGetLastError(), "when calling cudaHostAlloc");

	cudaHostGetDevicePointer( dPointer, *hPointer, 0 );

	check_error(cudaGetLastError(), "when calling cudaHostGetDevicePointer");

	*remaining_mem -= size;

  } else {

	fprintf(stderr, "Not enough memory left on device.\n");




(the check_error() function just prints an error message and quits if cudaGetLastError() returns anything but success)

Here is a contrived example, to show how it is called:


cudaSetDeviceFlags( cudaDeviceMapHost );

cudaDeviceProp devProp;

cudaGetDeviceProperties(&devProp, 0);

double remaining_mem = devProp.totalGlobalMem;

int buffer_size = 10;

int *buffer = NULL, *d_buffer = NULL;

mem_size = sizeof(*buffer) * buffer_size;

hostAlloc((void**) &d_buffer, (void**) &buffer, mem_size, cudaHostAllocMapped, &remaining_mem);

This works with no errors, but examining ‘buffer’ in cuda-gdb reveals that it is out-of-bounds. Running the code produces unpredictable results, as you would expect with using an invalid pointer: sometimes it segfaults, and it always computes the wrong result.

What could be causing this problem, and why is it not caught by cudaGetLastError()?

I think there’s a known issue with gdb being unable to handle pinned memory; I can check in the morning.

I think it might be more than that, because the segfaults happen when accessing the memory allocated by cudaHostAlloc.

However, it is possible that the error is caused by some other mistake in my program. In that case, how to debug it, since I can’t use CUDA-GDB?

If your problem is in the host code you can try regular gdb…

I get the same problem in gdb.

Interestingly, the segfaults only seem to happen when I compile with “-O2”. The version compiled with “-g -G” seems to work, although examining the memory with gdb and cuda-gdb still gives “out-of-bounds” errors.

Regardless of whether it segfaults or not, however, the results are still wrong.

OK, I got this resolved.

Since gdb reported the memory out of bounds, I assumed that cudaHostAlloc was the problem. However, this was a red herring. As suggested by tmurray, it seems like both gdb and CUDA-gdb just have trouble dealing with mapped memory.

Once I realized that cudaHostAlloc works correctly, I was able to find the source of my problems: I was accessing the memory after a kernel launch, without first calling cudaThreadSynchronize().