Summary: My program uses cudaHostAlloc() to allocate mapped memory. It succeeds with no errors, but the memory location it returns is actually out of bounds.
Details:
I am calling cudaHostAlloc from the following wrapper function:
void hostAlloc(void **dPointer, void **hPointer,
size_t size, unsigned int cuda_flags,
double* remaining_mem)
{
if (*remaining_mem >= size) {
cudaHostAlloc(hPointer, size, cuda_flags);
check_error(cudaGetLastError(), "when calling cudaHostAlloc");
cudaHostGetDevicePointer( dPointer, *hPointer, 0 );
check_error(cudaGetLastError(), "when calling cudaHostGetDevicePointer");
*remaining_mem -= size;
} else {
fprintf(stderr, "Not enough memory left on device.\n");
exit(EXIT_FAILURE);
}
}
(the check_error() function just prints an error message and quits if cudaGetLastError() returns anything but success)
Here is a contrived example, to show how it is called:
cudaSetDevice(0);
cudaSetDeviceFlags( cudaDeviceMapHost );
cudaDeviceProp devProp;
cudaGetDeviceProperties(&devProp, 0);
double remaining_mem = devProp.totalGlobalMem;
int buffer_size = 10;
int *buffer = NULL, *d_buffer = NULL;
mem_size = sizeof(*buffer) * buffer_size;
hostAlloc((void**) &d_buffer, (void**) &buffer, mem_size, cudaHostAllocMapped, &remaining_mem);
This works with no errors, but examining ‘buffer’ in cuda-gdb reveals that it is out-of-bounds. Running the code produces unpredictable results, as you would expect with using an invalid pointer: sometimes it segfaults, and it always computes the wrong result.
What could be causing this problem, and why is it not caught by cudaGetLastError()?