(little_jimmy's project is now) ashes and dust: pinned memory segmentation fault

little_jimmy · June 6, 2015, 1:38pm

hello,

the following line is happy to return to me a gift-wrapped segmentation fault

cudaMemcpyAsync(nlp_pnt_jac->h_temp_out_mul_sum,
nlp_pnt_jac->d_out_mul_sum, sizeof(double) * lint[0],
cudaMemcpyDeviceToHost, s[0]);

simply initializing nlp_pnt_jac->h_temp_out_mul_sum as ordinary memory, instead of pinned memory, of course removes the problem

cudaMallocHost(&nlp_pnt_jac->h_temp_out_mul_sum, sizeof(double) * 3);

vs

nlp_pnt_jac->h_temp_out_mul_sum = new double[3];

a cudaGetLastError() prior to the memory copy, and after the memory allocation did not return an error
MALLOC_CHECK_ set to 2, for what it is worth

the particular pinned memory allocation is part of several; the allocations must be pinned, as they generally catch key results returning from the device, asynchronously

cudaMallocHost(&nlp_pnt_jac->h_temp_out_mul_sum, sizeof(double) * 3);
cudaMallocHost(&nlp_pnt_jac->h_mul_sum_store, sizeof(double));
cudaMallocHost(&nlp_pnt_jac->h_le_out_status, sizeof(double));
cudaMallocHost(&nlp_pnt_jac->h_sug_coeff_delta, sizeof(double) *
nlp_pnt_jac->coeff_cnt);

interesting is that the debugger would show the address as unique, but the array starting value, as is generally reported by the debugger, is the same as that of one of the other pinned memory arrays

at this point, i am hypothesizing that the driver is somehow (incorrectly) grouping these small pinned memory allocations to reduce waste; it is the only way i can make sense of this

the next test then would be to allocate one pinned memory region equal to the total size needed, and to rather use pointers into this region

your views?

Robert_Crovella · June 6, 2015, 1:48pm

print out lint[0] immediately prior to making the call to cudaMemcpyAsync, in the case where the seg fault occurs.

Other than that, I would suggest providing a (as short as possible) complete code that someone else could run, and see the issue.

little_jimmy · June 7, 2015, 8:25am

seems no bug can escape the supreme wrath and fury of texas ranger txbob

in contemplating the reproduction of the instance, i shifted the particular memory copy to just after the application’s memory footprint has been initialized, and just before the core functions being called, to confirm that the memory copy is location-independent, and to estimate the minimum code necessary to reproduce

turned out that the memory copy with segmentation fault was indeed location dependent, and i could isolate and identify another array allocated to the wrong size, this way

txbob, if my memory serves me, you once mentioned that memcheck can be combined with the debugger - something in that line
seemingly, MALLOC_CHECK_ is host-side inclined, and lacking
i need a stronger tool as 5 minutes into my application/ code, i start to i) forget, ii) assume, and iii) expect; hence my significant reliance on tools to keep me on the path
i do not necessarily see sending a program still in pre-debugging state to a tool like valgrind, etc as an option

Robert_Crovella · June 8, 2015, 2:52pm

I think this is all host-side code. I’m not sure what the best tool is to capture stack corruption (if that is what is happening here). I agree that MALLOC_CHECK is limited and will not catch some forms of stack corruption. I’m not sure what the best tool is.

little_jimmy · June 8, 2015, 7:29pm

it is mixed code; but the host side arrays and structures are piling up, as it is much experimental code - i am constantly adding functionality bits, to note effect

i have come to appreciate the debugger immediately halting on a device side segmentation fault - any thread accessing an address outside of what has been assigned in the context of that array
i more or less expect(ed) the same with host side arrays, but on a number of occasions i have now accessed arrays outside of what has been allocated, and the error typically manifests down the line as something completely different, much harder to debug

Topic		Replies	Views
Simple Malloc on host CUDA Programming and Performance	4	1306	October 22, 2009
cudaMallocHost causing segfault — how to track its memory footprint to avoid exhausting GPU memory? CUDA Programming and Performance	3	104	June 12, 2025
Problems with cudaHostAlloc and cudaMemcpyAsync CUDA Programming and Performance	5	4601	February 8, 2010
Rewriting memory allocated via cudaHostAlloc() CUDA Programming and Performance	0	11308	March 23, 2011
How to track down a Segmentation Fault in Big Programs CUDA Programming and Performance	5	1926	January 26, 2011
host malloc segmentation fault CUDA Programming and Performance	2	822	January 9, 2015
CUDA C++ Segmentation Fault CUDA Programming and Performance	7	14946	October 1, 2017
cudaMemcpy from device to host and Segmentation Fault cudaMemcpy Segmentation Fault CUDA Programming and Performance	2	5612	December 9, 2008
kernel only executes successfully once, then cudaMemcpy segfaults CUDA Programming and Performance	2	3212	March 31, 2009
Error when using pinned memory Legacy PGI Compilers	2	579	January 20, 2021

(little_jimmy's project is now) ashes and dust: pinned memory segmentation fault

Related topics