I found no such thing “call cudaSetDevice
first, call nvshmem_init
then” from the doc, did I miss something?
and if cudaSetDevice
is neccessary, maybe it should resident int the nvshmem_init
implementation, and avoid the user to manually add it?
or some WARNING messages helps a lot. It toke me so long to find the problem.