during the startup of my multi-GPU simulation on a single host (e.g. two GPU cards in one server) I call acc_get_device_type(). There seems to be no issue unless I run it through the debugger, which runs all the ranks in a lockstep fashion (I am using totalview). I get the impression that there is a race condition and both ranks try to call acc_get_device_type at exactly the same time and the simulation stalls. The only way around it is to set a couple of breakpoints and then step the ranks individually one after the other of this code.
This is not very convenient and I am wondering whether my understanding is correct and it is simply a limitation by the driver/runtime.