I’d like to know if it is possible to mix OpenMP and CUDA where each OpenMP thread calls the same CUDA kernel, but to process different regions of memory.
Suppose the matrix M(n,th), I want to make each thread tid process M(n,tid), i.e., each thread access only the memory positions related to the corresponding second index.
I wrote a code that creates a parallel region with multiple OpenMP threads and inside the region there is a call to a CUDA kernel. I added a print inside the kernel to see the blockId, and it seems all threads access the same blocks. For example, using 2 OpenMP threads and 3 CUDA blocks:
tid = 1, block = 1
tid = 1, block = 2
tid = 1, block = 3
tid = 2, block = 1
tid = 2, block = 2
tid = 2, block = 3
I thought that the allocation of blocks would result in blocks 1, 2, 3 for tid =1, and 4, 5, 6 for tid = 2, but it seems that different OpenMP threads request the same GPU blocks.
Is it possible to run the same CUDA kernel with different OpenMP threads, allocating independent GPU resources?
I’m using Pascal and Volta GPUs.