Hello,
I have been working on a project that uses Intel MPI to spread work out over multi-gpu Azure nodes. Usually, the executable is run like this in a PBS file:
mpiexec -s all -np 8 -machinefile $PBS_NODEFILE <linked executable>
To use cuda memcheck, we have been able to simply run this:
mpiexec -s all -np 8 -machinefile $PBS_NODEFILE cuda-memcheck <linked executable>
This worked just fine and reported errors for each MPI rank. However, we have recently run into some issues that aren’t being reported and since memcheck is depreciated, I wanted to upgrade to compute sanitizer. But the following doesn’t work:
mpiexec -s all -np 8 -machinefile $PBS_NODEFILE compute-sanitizer <linked executable>
This returns “Error: No attachable process found. compute-sanitizer timed-out.” However, since I cannot run the executable directly from the nodes it runs on, I cannot manually attach the process (as far as I know). From the unanswered post here, I also tried the following to no avail:
mpiexec -s all -np 8 -machinefile $PBS_NODEFILE compute-sanitizer --require-cuda-init=no --max-connections=1000 <linked executable>
Has anyone been able to use the sanitizer as a direct drop in for memcheck when it comes to mpi processes? Is there something I am missing or need to fix in order for this to work? Any help is appreciated!
PS: I am using the v11.7 toolkit.