Using a vector of thrust::device_ptr

saulocpp · March 23, 2022, 9:23pm

I need to wrap some device pointers in thrust::device_ptr, 1 for each GPU in the system, so I can use thrust’s algorithms, but am unsure if it is possible to initialize it in a loop.
For a single device_ptr we use:

thrust::device_ptr <float> dev_ptr(dev_input);

Where dev_input is our float * already allocated by cudaMalloc.
Then I declared a vector of this type and intend to initialize it in a loop going over as many devices as there are in the system:

std::vector <thrust::device_ptr <float>> dev_ptr(NUM_DEVICES);   // compiles without issues
for(int dev_id = 0; dev_id < NUM_DEVICES; dev_id++)
    // how to I initialize dev_ptr[dev_id] so it points to its respective dev_input?

Where NUM_DEVICES holds the number of devices obtained from cudaGetDeviceCount(). Any idea is welcome (if it is at all possible).

Robert_Crovella · March 23, 2022, 10:23pm

This seems to work:

#include <thrust/device_vector.h>
#include <vector>

int main(){
  int NUM_DEVICES;
  cudaGetDeviceCount(&NUM_DEVICES);
  float **dev_input = new float*[NUM_DEVICES];
  // loop to cudaSetDevice/cudaMalloc on dev_input here, or similar
  std::vector <thrust::device_ptr <float>> dev_ptr(NUM_DEVICES);
  for(int dev_id = 0; dev_id < NUM_DEVICES; dev_id++)
    dev_ptr[dev_id] = thrust::device_ptr<float>(dev_input[dev_id]);

}

saulocpp · March 23, 2022, 10:36pm

Thanks @txbob, that last line was the missing link.

striker159 · March 24, 2022, 9:23am

In most cases, you could also explicitly specify that a thrust algorithm should be executed on the device for raw pointer inputs, using thrust::device as execution policy.

thrust::algorithm(thrust::device, ....)

system · April 7, 2022, 9:23am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.