CUDA compute-sanitizer internal error: CUDA initialized before the Sanitizer

I’m trying to run compute-sanitizer on two systems: a docker with CUDA 11.2 and a cluster with CUDA 11.7. On both systems, whenever I’m running compute-sanitizer (with any arguments, although I’m mostly trying to use the --leak-check full, --save and --log-file ones), I’m getting this error:

========= Internal Sanitizer Error: CUDA initialized before the Sanitizer. The Sanitizer will be disabled.

Does anyone know where this might come from ? I’m using Clang v14.0.0 & 14.0.6 to compile a C++ OpenMP-offload (GPU kernels are written using OpenMP rather than CUDA) program that also calls cuFFT & cuBLAS. I cannot seem to find any information online about this error…

Can you share the full command and output that is causing this? Is there any sort of job launcher/scheduler involved?

You’re right that there’s a scheduler involved (Slurm) on the cluster, although I get the same results inside the docker, where there is no scheduler involved at all (both use the same tool chains, with some minor version differences, mentioned in my initial post).

The run command is simply srun compute-sanitizer ./exe (I get the same errors with srun compute-sanitizer --leak-check full --save savefile --log-file logfile). (either with or without srun) The output:

========= Internal Sanitizer Error: CUDA initialized before the Sanitizer. The Sanitizer will be disabled.
/* non-relevant application output */
========= ERROR SUMMARY: 1 error

Maybe it’s worth mentioning it’s a MPI application as well (I’m using a CUDA-aware OpenMPI v4.1.x with UCX).

Thanks for the details. I have filed a bug with out engineering team. They may reply directly, otherwise I’ll let you know when I have some more information.

It would also be helpful if there was a simple reproducible test case you could share.

The engineering team is requesting a reproducer to try and debug this internally. Do you have something you can provide?

This piece of code reproduces the mentioned bug (with CUDA-aware, UCX-based OpenMPI).

#include "mpi.h"

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);

Could you please provide more information regarding:

  • Which version of OpenMPI / UCX you use, and where we can we download them / how we can install them
  • Which command-line do you use to compile / run this sample

This error usually happens where CUDA is initialized from a static library initialization code (e.g. initialization of a global variable in one of your libraries), which is not a use case supported by the tool. You can check by running gdb on this program and breaking on symbol cuInit, then looking at the backtrace to see which library calls it.