I need to wrap some device pointers in thrust::device_ptr, 1 for each GPU in the system, so I can use thrust’s algorithms, but am unsure if it is possible to initialize it in a loop.
For a single device_ptr we use:
thrust::device_ptr <float> dev_ptr(dev_input);
Where dev_input is our float * already allocated by cudaMalloc.
Then I declared a vector of this type and intend to initialize it in a loop going over as many devices as there are in the system:
std::vector <thrust::device_ptr <float>> dev_ptr(NUM_DEVICES); // compiles without issues
for(int dev_id = 0; dev_id < NUM_DEVICES; dev_id++)
// how to I initialize dev_ptr[dev_id] so it points to its respective dev_input?
Where NUM_DEVICES holds the number of devices obtained from cudaGetDeviceCount(). Any idea is welcome (if it is at all possible).
In most cases, you could also explicitly specify that a thrust algorithm should be executed on the device for raw pointer inputs, using thrust::device as execution policy.