Hi Peter85,
Does the code also work if using a single rank?
I’m wondering if the GPUs are set to be in exclusive mode. If so, then I’d expect 1 rank work, but the code to fail with multiple ranks. Granted, I would expect a different failure, “all CUDA-capable devices are busy or unavailable”, instead of a segv, so this might not be the issue.
To check if you’re running in exclusive mode or not, from the same script that runs the saxp binary, run “nvidia-smi” and look for the “Compute M.” field.
% nvidia-smi
Mon Aug 6 13:45:55 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 0000:02:00.0 Off | 0 |
| N/A 36C P0 25W / 250W | 0MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 0000:82:00.0 Off | 0 |
| N/A 44C P0 26W / 250W | 0MiB / 16276MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+