Using constant memory in Fortran CUDA and with multiple GPUs

MaciejG · June 4, 2013, 11:42pm

Hello,

I’m developing a program in Fortran CUDA and trying to use/access multiple GPUs from a single host thread. The original code (single GPU only) was using global and constant memory and when adding support for multiple GPUs I could not find a way to specify on which device to place Fortran variables with the “constant” attribute.

I have tried this:

integer, constant :: iconst

DO dev = 0, maxdev
ignore = cudaSetDevice(dev)
iconst = 1
END DO

and this compiles and runs but trying to access iconst from a kernel launched on the higher numbered devices results in “unspecified launch failure”.

Is there a way to specify the placement of variables in the constant memory on specific device? I looked through the user manual and “CUDA Fortran for Scientists and Engineers” but there is little information on supporting multile-gpus in general.

Thanks,

Maciej

MatColgrove · June 5, 2013, 4:38pm

Hi Maciej,

Did you setup the Peer-To-Peer communication first? It’s required in order to use GPUDirect.

My article on multi-GPU program using CUDA Fortran has a section on GPUDirect (part 4) Account Login | PGI, including the set-up code. While I don’t use constant memory in this example, I just went back tried adding some variables and it worked as expected. Though, if you continue to encounter issues, let me know and we can work through them,

Mat

MaciejG · June 5, 2013, 11:40pm

Hi Mat,

Thanks for your reply.

I’m not sure if GPUDirect is actually relevant to what I am trying to achieve. I understand GPUDirect is required if a kernel running on device 1 is trying to access constant memory on device 0 - is that correct?

What I am trying to do is to have kernel running on dev 0 access constant memory on dev 0 and kernels on dev 1 accessing constant memory on device 1. But it is not clear to me how to specify (we’re talking Fortran here) that a variable declared with attribute(constant) is allocated in the constant memory of device 1 or 2 instead of device 0.

Is there a way of achieving this in Fortran CUDA (PGI 13.2 and CUDA5.0) or is it something that is not currently supported?

Cheers,

Maciej

MatColgrove · June 6, 2013, 6:17pm

I believe there are actually multiple context created hence you need establish Peer-to-peer so you can manage them. Granted, I’ve only done a little work with using multiple GPUs from a single host thread, so there may be an better way, but using Peer-to-Peer seems to work.

Personally, I much prefer using MPI and then establish a single GPU context to each MPI process. Logically I find it easy to manage, cleaner in implementation, and scales better. Of course, you do what’s best for your program.

Mat

MaciejG · June 7, 2013, 4:56am

I could do that but that would just let me access constant memory on dev A from kernel running on devB, therefore negating performance benefits of using constant memory. ;)

That is something that we’ve been thinking about for later. I hoped that a pipelined copy between two GPUs accessd from same host thread would be a bit faster compared to MPI so that we could reap benefits of multiple GPUs even for moderately sized problems.

Thanks for your help anyway. I’ve decided to refactor the code so that scalar constant become kernel arguments passed by value, while array constants will move to global memory.

Topic		Replies	Views
CUDA constant memory in multi-GPU Legacy PGI Compilers	6	4863	July 30, 2019
where to declare constant memory for multi-gpu case? constant memory, multiple gpus, multi-gpu CUDA Programming and Performance	2	1490	November 15, 2012
Multi gpu and constant memory help CUDA Programming and Performance	1	4506	August 11, 2009
problem with constant memory Legacy PGI Compilers	4	4249	September 7, 2011
About the constant memory usage in CUDA Fortran Legacy PGI Compilers	1	1509	May 27, 2010
If I have two GPU cards, how can I use the GPUs' constant memory properly? CUDA Programming and Performance	1	483	October 9, 2018
Designation of constant memory for device CUDA Programming and Performance	1	5695	November 23, 2010
Multi GPU question CUDA Programming and Performance	7	5425	April 27, 2009
Oddity in declaring constant memory in host subprograms Legacy PGI Compilers	3	3251	December 28, 2015
Cuda Fortran cannot pass array from host to device Legacy PGI Compilers	5	12028	April 26, 2010

Using constant memory in Fortran CUDA and with multiple GPUs

Related topics