Between two Gpus, when creating a PE using IBRC mode, the default implementation is single QP. Can I create multiple PE teams, and then each Team uses a different QP for data transfer? If so, how to do parallel data transfer based on Team?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Multiple processes sharing the same device: MPI+CUDA | 3 | 1024 | May 6, 2017 | |
The NCCL communications on dual cpus and multi gpus | 0 | 284 | January 23, 2024 | |
NVSHMEM Host Execution Pattern | 2 | 941 | April 18, 2022 | |
How many cudamemcpyasync can run at the same time in their respective streams | 3 | 476 | August 17, 2023 | |
How to implement cuda IPC efficiently | 0 | 551 | June 17, 2020 | |
NVSHMEM without mpi, 1 thread for each GPU on a node- howto initialize? | 1 | 833 | April 18, 2022 | |
How does “cudaMemcpyPeer” implement? | 3 | 1362 | February 6, 2024 | |
CUDA IPC vs NVSHMEM for shared memory between applications | 5 | 3708 | February 6, 2023 | |
NVSHMEM on multi-node GPUs failed . My gpu is A5000 | 5 | 852 | April 1, 2024 | |
Can I use blocks instaed of streams to overlap data transfer with compute? | 2 | 345 | October 12, 2021 |