Related to my previous post on bugs in initializing globals, I note a difference which seems like a regression to me, but maybe it is more a matter of being extremely lucky on some platforms and not on others.
In CentOS7 (bare metal) and with the nvhpc 21.9 Ubuntu 20.04 container, I can build a separate library using g++ and normal linking. This library allocates memory using the system heap. I currently believe it never actually asks the main calling process to free memory allocated by it, nor is it trying to free memory allocated by the main binary. Anyway, linking to it and using it “just works”. The memory touched by it is never actually going into the GPU hotpath.
On CentOS 8, on the other hand, linking to this library and calling some trivial functions within it cause heap corruption, with the behavior looking like the library tries to free memory using glibc that was originally allocated by the managed heap. Again; I don’t see where that allocation would take place, but it seems like it does.
I’ve verified with AddressSanitizier using the pure g++ build (of main code and library) that there are no obvious heap corruption issues that would just have remained hidden.
An observation is that symbols like “free” are provided in glibc on CentOS 8 when I run the binary, but by ld-linux-x86-64.so.2 when I run in the Ubuntu container.
The library is statically linked, so I could imagine that the Nvidia linker, when managed memory is enabled, would redirect some calls, or even manage to do it for shared libraries similar to how you can swap in tcmalloc and other dropin malloc replacements with LD_PRELOAD. If this is what’s going on, it seems like some difference in CentOS 8, e.g. some symbol naming change maybe, makes it break and break bad.
I should note that I never even rebuilt the static library when trying the Ubuntu container. That is, a static lib built on CentOS 8 links fine and gives no heap corruption crashes, when I build and run the main binary linking to it within a Ubuntu 20.04 NVHPC SDK container.
So, two questions:
- Am I just lucky that this works on CentOS 7 and Ubuntu, or is it expected that all heap allocations will be redirected?
- Are you aware of some regression on CentOS/RHEL 8?