Yes! I want to allocate/free memory separately as cudaMalloc and cudaFree are synchronous (Just to make sure that every kernel will be run freely on the respective GPU without any blocking)
How can a single pointer stored in DevMem simultaneously point to GPUNumber different memory allocations (assuming GPUNumber > 1)? Hint: It can’t. You want one pointer per allocation, e.g. an array of GPUNumber pointers.
As indicated already, you would want to use 4 separate pointers for the above example.
Feel free to convert any of the above to loops, according to your knowledge of C++ programming.
But don’t change the order of operations.
Thanks Robert. But the thing is the number of pointers will be variable on runtime (depending on the user entry), each pointer will have different size. So can you suggest me the best solution for it?
All pointers have the same size (64 bits or 8 bytes typically). The number of pointers required will vary with the number of GPUs. Allocate an array of pointers dynamically, based on GPUNumber.
I’ve given you all the CUDA specific knowledge needed, already. It’s just C++ programming now.
int num_pointers;
// your code sets the above variable to something at runtime
int *sizes = new int[num_pointers];
// your code fills in the size of each pointer in bytes
float **d = new float*[num_pointers];
for (int i = 0; i<num_pointers; i++){
cudaSetDevice(i);
cudaMalloc(&d[i], sizes[i]);}
for (int i = 0; i<num_pointers; i++){
cudaSetDevice(i);
kernel<<<...>>>(d[i],....);}
for (int i = 0; i<num_pointers; i++){
cudaSetDevice(i);
cudaFree(d[i]);}