OpenMP and CUDA

Hello,

I’d like to know if it is possible to mix OpenMP and CUDA where each OpenMP thread calls the same CUDA kernel, but to process different regions of memory.

Suppose the matrix M(n,th), I want to make each thread tid process M(n,tid), i.e., each thread access only the memory positions related to the corresponding second index.

I wrote a code that creates a parallel region with multiple OpenMP threads and inside the region there is a call to a CUDA kernel. I added a print inside the kernel to see the blockId, and it seems all threads access the same blocks. For example, using 2 OpenMP threads and 3 CUDA blocks:

tid = 1, block = 1
tid = 1, block = 2
tid = 1, block = 3
tid = 2, block = 1
tid = 2, block = 2
tid = 2, block = 3

I thought that the allocation of blocks would result in blocks 1, 2, 3 for tid =1, and 4, 5, 6 for tid = 2, but it seems that different OpenMP threads request the same GPU blocks.

Is it possible to run the same CUDA kernel with different OpenMP threads, allocating independent GPU resources?

I’m using Pascal and Volta GPUs.

Thanks

If the OpenMP threads are using the same device, they may share some data, but in terms of kernel launches, they would be separate.

I added a print inside the kernel to see the blockId, and it seems all threads access the same blocks.

Providing a reproducing example code would be very helpful here since I’m just guessing. But, the block enumeration would be the same for each kernel launch. So block 1 for tid 1 is a different block than tid 2, but just happens to have the same id.

-Mat