Keep the vector variable in GPU memory for multi GPU Functions

  1. I know NVC++ can compile the source code in C++17 offload to the GPU.
  2. My request is that I have 2 continuous functions, i.e. First function process vector A and update the results to vector B. The Next function process vector B and update the results to vector A.
    How can I set up to make sure vector A and vector B always store in GPU memory without copy out to host memory?

Our support for C++17 standard language parallelism uses CUDA Unified Memory which will copy the vectors to the device upon first use and wont copy it back to the host until it’s used on the host. Hence, so long as you don’t access the vector on the host between the two functions, the data wont be copied back.