CUDA compute-sanitizer internal error: CUDA initialized before the Sanitizer

I’m trying to run compute-sanitizer on two systems: a docker with CUDA 11.2 and a cluster with CUDA 11.7. On both systems, whenever I’m running compute-sanitizer (with any arguments, although I’m mostly trying to use the --leak-check full, --save and --log-file ones), I’m getting this error:

========= COMPUTE-SANITIZER
========= Internal Sanitizer Error: CUDA initialized before the Sanitizer. The Sanitizer will be disabled.
=========

Does anyone know where this might come from ? I’m using Clang v14.0.0 & 14.0.6 to compile a C++ OpenMP-offload (GPU kernels are written using OpenMP rather than CUDA) program that also calls cuFFT & cuBLAS. I cannot seem to find any information online about this error…

Can you share the full command and output that is causing this? Is there any sort of job launcher/scheduler involved?

You’re right that there’s a scheduler involved (Slurm) on the cluster, although I get the same results inside the docker, where there is no scheduler involved at all (both use the same tool chains, with some minor version differences, mentioned in my initial post).

The run command is simply srun compute-sanitizer ./exe (I get the same errors with srun compute-sanitizer --leak-check full --save savefile --log-file logfile). (either with or without srun) The output:

========= COMPUTE-SANITIZER
========= Internal Sanitizer Error: CUDA initialized before the Sanitizer. The Sanitizer will be disabled.
========= 
/* non-relevant application output */
========= ERROR SUMMARY: 1 error

Maybe it’s worth mentioning it’s a MPI application as well (I’m using a CUDA-aware OpenMPI v4.1.x with UCX).

Thanks for the details. I have filed a bug with out engineering team. They may reply directly, otherwise I’ll let you know when I have some more information.

It would also be helpful if there was a simple reproducible test case you could share.

The engineering team is requesting a reproducer to try and debug this internally. Do you have something you can provide?

This piece of code reproduces the mentioned bug (with CUDA-aware, UCX-based OpenMPI).

#include "mpi.h"

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);
    MPI_Finalize();
}

Could you please provide more information regarding:

  • Which version of OpenMPI / UCX you use, and where we can we download them / how we can install them
  • Which command-line do you use to compile / run this sample

This error usually happens where CUDA is initialized from a static library initialization code (e.g. initialization of a global variable in one of your libraries), which is not a use case supported by the tool. You can check by running gdb on this program and breaking on symbol cuInit, then looking at the backtrace to see which library calls it.

I’m getting a similar error with an OpenMP offloading code built with LLVM. Below is a minimal OpenMP reproducer and what running it looks like. Notably, the --require-cuda-init=no argument appears to have no effect.

user@gpu07:~/omp_target_issues/simple$ cat main.cpp
#include <stdio.h>
#include <omp.h>
int main( int argv, char** argc ) {
  int is_initial = omp_is_initial_device();
  #pragma omp target map(from:is_initial)
  {
    is_initial = omp_is_initial_device();
  }
  if( !is_initial )
    printf( "Hello world from accelerator.\n" );
  else
    printf( "Hello world from host.\n" );
  return 0;
}
user@gpu07:~/omp_target_issues/simple$ clang++ --version
clang version 16.0.0 (git@github.com:llvm/llvm-project.git 124f90bd89b97066e01274a9bba1068f3a175d66)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /gpfs/jlse-fs0/projects/intel_anl_shared/openmc_data/compilers/llvm-16-rc1/bin
user@gpu07:~/omp_target_issues/simple$ clang++ -Wall -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target -march=sm_80 main.cpp -o test
clang-16: warning: CUDA version 11.6 is only partially supported [-Wunknown-cuda-version]
clang-16: warning: CUDA version 11.6 is only partially supported [-Wunknown-cuda-version]
user@gpu07:~/omp_target_issues/simple$ 
user@gpu07:~/omp_target_issues/simple$ ./test
Hello world from accelerator.
user@gpu07:~/omp_target_issues/simple$ compute-sanitizer --tool=initcheck ./test
========= COMPUTE-SANITIZER
========= Error: CUDA initialized before the Sanitizer. The Sanitizer will be disabled
========= 
Hello world from accelerator.
========= ERROR SUMMARY: 1 error
user@gpu07:~/omp_target_issues/simple$ compute-sanitizer --tool=initcheck --require-cuda-init=no ./test
========= COMPUTE-SANITIZER
========= Error: CUDA initialized before the Sanitizer. The Sanitizer will be disabled
========= 
Hello world from accelerator.
========= ERROR SUMMARY: 1 error
user@gpu07:~/omp_target_issues/simple$ nvidia-smi
Fri Feb  3 17:59:26 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:43:00.0 Off |                    0 |
| N/A   24C    P0    33W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

--require-cuda-init no is designed to allow non-CUDA programs to be executed with Sanitizer (the tool will not return an error). This error is different: it signals that the CUDA driver was initialized before the tool is injected. I will compile clang locally and try to reproduce this.

1 Like

I have identified the issue and I am currently evaluating whether or not a fix on our side is possible without making changes to the LLVM codebase.

1 Like

Thanks @aladram! I’ll also note that the cuda-memcheck utility does work fine, so it seems like a fix should be possible on the NVIDIA side.

I’m curious if there’s any progress on this issue? Will it ever be possible to use compute-sanitizer with LLVM-compiled OpenMP code?

For others reading this, what happens is OpenMP initializes the target plugins during shared library loading (when libomptarget is loaded) and that initializes CUDA before compute-sanitizer gains control of the application.

Thanks Mark for your message. Support for LLVM-compiled OpenMP offloaded code will be considered in a future release but is not planned at this time.