cudaHostAlloc overwriting system memory while allocating a single bool

I’m writing a CUDA program for Windows. The kernel can take a long time to complete (>30 seconds). To prevent tripping the Windows GPU watchdog timer, I continuously exit and re-enter the kernel using a do-while loop. To determine whether the kernel should keep processing in the while loop or if it should exit the loop, I memcpy a single bool from the kernel to system ram every loop iteration.

I pinned the host bool using cudaHostAlloc as follows:

bool h_CallContinue[1];


cudaHostAlloc((void**)&h_CallContinue, sizeof(bool), 0);

I noticed that this was also setting another host variable (bool) that isn’t pinned. Fortunately for me, this particular instance was easy to notice. I changed the declaration of the h_CallContinue variable to be an array of 4 bools instead. I only ever transfer the first bool of the array. That seems to have fixed the overwrite of that other particular host bool.

My code now looks like this:

bool h_CallContinue[4];


cudaHostAlloc((void**)&h_CallContinue, sizeof(bool), 0);

I looked through the documentation and the web, and I couldn’t find a single reference to this effect. Also, I’m not certain that using a size of 4 bools (32 bits on Windows) is sufficient.

Is there a minimum size of the host memory that must be allocated for a proper pin?

Also, I know that using pinned host memory will allow faster transfer speeds, but this variable is just 8 bits. Will it also reduce latency?

Thank You In Advance

A pointer and an array are not the same thing.

This is an array:

bool h_CallContinue[1];

This is a pointer:

bool *h_CallContinue;

switch to using a pointer, as modifying the address of an array is undefined behavior in C and C++. An array’s name usually decays to a constant pointer to the first element of the array. You are not allowed to attempt to modify it. Rewriting a stack variable which is a pointer is entirely legal, however.

Do that and things should be fine, even if you are only using 1 bool. If that doesn’t work, I suggest providing a short, complete code that demonstrates the trouble. It should be perfectly legal in CUDA to allocate only 1 pinned buffer byte, and not have any side-effects.

Thank you. That fixed it!

I had no idea that modifying the address of an array is undefined. I will remember that.


i’m not 100% sure, but probably for an array, &h_CallContinue is the same as h_CallContinue. check it yourself by printing it. so, your code overwritten bools with an address of allocated area. and since bool is stored as single byte, and address is at least 4 bytes, with h_CallContinue[1] it overwritten adjacent boolean values too

I had no idea that modifying the address of an array is undefined. I will remember that.

in general, C/C++ books should have a chapter about pointers/arrays. i suggest to find [a book with] exercises on this topic and do them all. after all, it’s the core hard topic of C, so learning this topic as deep as possible will make your future life with C simpler. sorry for unasked advices

Yes, that’s right, both &h_CallContinue and h_CallContinue return the same address to the console.

I will be studying pointers further after this.

Upon more reflection, it only makes sense that modifying the underlying (“intrinsic”) pointer to the array would be undefined since the location of the array pointer is something, I would imagine, that is meant to be abstracted from the programmer whereas the actual data in the array is what’s meant to be manipulated (via the returned pointer). A pointer should be explicitly declared before being manipulated.

Thank you both for the clarification.