Problem with cudaMallocHost and structs

Hi, I am having problems allocating host memory for a struct. I have a program that has a line like

cudaMallocHost((void**)&structPtr, sizeof(struct updateStruct));

The structure contains several reasonably large arrays of floats.

The program is really unstable. Some times it runs fine, producing the same results as my reference C solution, while at others it just crashes.

If I change the above allocation to a normal malloc everything works as it should every time.

Any ideas what this could be?

an out of bounds memory write, trashing the heap maybe? Maybe CUDA reacts more allergic to that than the C compiler’s own heap implementation.

If you are running on linux, try valgrind: it can help identify any out of bounds memory write.

I tracked down where it is going wrong (and it seems that it is accessing out of bounds memory) but I’m still not sure why this is happening. Is there some basic difference between how memory is allocated by cudaMallocHost and an ordinary malloc?

Yes. cudaMallocHost memory is page-locked when it is allocated meaning that the OS cannot page it to disk.