I’m running the simplest cuda code which only calls the cudaMemGetInfo function on an IBM server with 4 GPUs.
When I link the application to the -ltcmalloc library, it seg-faults. If I link the application WITHOUT the -ltcmalloc all goes well.
If I do link with -ltcmalloc but specify a CUDA_VISIBLE_DEVICES parameter (with any value, for example 100), it also runs.
I really have no idea what happens, maybe the CUDA_VISIBLE_DEVICES parameter causes CUDA to load first its libraries prior to the tcmalloc lib and therefore prevents the crash?