I am writing NVSHMEM program, and launch the program with the following command in the case of two GPUs on the same host.
nvshmem_src_2.0.3-0/scripts/bin/nvshmrun -n 2 -ppn 2 ./test_nvshmem
I found the execution flow of NVSHMEM is like
---->host-thread-0--->GPU_device_kernel-0 on GPU-0----> ---->host-thread-1--->GPU_device_kernel-1 on GPU-1---->
I just wondering if there is any way that I can use NVSHMEM as the following execution pattern on host and device
---->host-thread--->GPU_device_kernel-0 on GPU-0---->host-thread \--->GPU_device_kernel-1 on GPU-1---->/
because I want the host thread for doing some common tasks that will be the same for both threads and GPU kernels. And both host threads should use the same host memory instead of duplicating two separate memory on host.