ICE when set maxregcount low in CUDA

Tuan · September 2, 2010, 5:58pm

Would setting a low number of registers per thread cause the crash during the compilation? I’m using PGI Fortran 10.5, CUDA 2.3 and CUDA 3.0.

ptxas /tmp/pgcudaforNYj1s0R2fTW.ptx, line 0; fatal : (C9999) max reg limit too low
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (gpu_utility.f95: 353)
PGF90/x86-64 Linux 10.5-0: compilation aborted

Thanks,
Tuan

mwolfe · September 2, 2010, 6:44pm

The PGI Fortran compiler sets the max reg limit to use for the NVIDIA ptx assembler based on the thread block size and the device type. For instance, for compute capability 1.0-1.2, there are 8K registers available; for Tesla (1.3) and Fermi (2.0), there are 16K available. If the compiler uses a thread block of, say, 16x16, it will divide the register count (8K or 16K) by the number of threads in a block (256), to make sure there are enough registers for at least one thread block. In this case, 8K/256 is 32.
I’m really surprised by this message from the ptx assembler, I hadn’t seen this one before.
Two possible ways to affect this: If you are using the Accelerator model, use loop directives to explicitly set the thread block size for each loop, making the thread block smaller. This will allow more registers per thread. Alternatively, use the -ta=nvidia,maxregcount:n or -Mcuda=maxregcount:n (for CUDA Fortran) to set the max reg count explicitly. You can run the compiler with ‘-v’ to see the invocation of the GPU compiler; this will be an invocation of ‘pgnvd’ and you can see the ‘-regs’ argument to this that sets the max reg limit to the PTX assembler. The -Minfo=accel messages will tell you the thread block size being used for each loop as well.

Tuan · September 3, 2010, 6:23pm

Michael Wolfe:

The PGI Fortran compiler sets the max reg limit to use for the NVIDIA ptx assembler based on the thread block size and the device type. For instance, for compute capability 1.0-1.2, there are 8K registers available; for Tesla (1.3) and Fermi (2.0), there are 16K available. If the compiler uses a thread block of, say, 16x16, it will divide the register count (8K or 16K) by the number of threads in a block (256), to make sure there are enough registers for at least one thread block. In this case, 8K/256 is 32.
I’m really surprised by this message from the ptx assembler, I hadn’t seen this one before.
Two possible ways to affect this: If you are using the Accelerator model, use loop directives to explicitly set the thread block size for each loop, making the thread block smaller. This will allow more registers per thread. Alternatively, use the -ta=nvidia,maxregcount:n or -Mcuda=maxregcount:n (for CUDA Fortran) to set the max reg count explicitly. You can run the compiler with ‘-v’ to see the invocation of the GPU compiler; this will be an invocation of ‘pgnvd’ and you can see the ‘-regs’ argument to this that sets the max reg limit to the PTX assembler. The -Minfo=accel messages will tell you the thread block size being used for each loop as well.

Hi Michael,
I’m using CUDA Fortran. This error occurs when I try to set the maxregcount explicitly. There is no problem if I leave it to the compiler.

My configuration is <<<104, 192>>>. The maxreg set by the compiler is 24, and when I try to set it to 16, I get the above ICE error.

Personally, I think that there should be no problem when I set the maxregcount a low value as the data can be spilled to the global memory.

I’m using Tesla 1.3, with 16K registers.

Thanks,
Tuan

mwolfe · September 3, 2010, 7:53pm

I have to agree with you, there should be no problem setting the value too low, if spilling works.
But this is not a problem that PGI can solve. The messages comes from the NVIDIA PTX assembler ptxas, which we redistribute, but which is provided by NVIDIA. I’m sorry to say, but I don’t think we can help much here.

Tuan · September 7, 2010, 7:16pm

Thanks, Michael.

Bests,
Tuan.

Topic		Replies	Views
weird ICE when using -Mcuda=maxregcount:160 Legacy PGI Compilers (archived)	2	2105	September 6, 2011
CUDA Fortran- threads Legacy PGI Compilers (archived)	5	4146	April 14, 2011
`maxrregcount` silently ignored by `nvcc` and `ptxas` CUDA Programming and Performance	19	815	October 21, 2024
two questions about maxrregcount parameter of nvcc CUDA Programming and Performance	1	13765	July 27, 2010
Ways to reduce registers per thread in CUDA Fortran? Legacy PGI Compilers (archived)	5	5247	July 15, 2019
Registers in Fermi (cc2.0) for cuda fortran Legacy PGI Compilers (archived)	0	6057	April 14, 2011
register count frustration CUDA Programming and Performance	4	4549	September 29, 2011
CUDA FORTRAN/OpenACC "Overflow" Register with maxr Legacy PGI Compilers (archived)	5	6261	February 5, 2014
limit number of registers Legacy PGI Compilers (archived)	4	4305	September 4, 2014
How is the number of required registers per thread counded? CUDA Programming and Performance	2	1570	November 20, 2009

ICE when set maxregcount low in CUDA

Related topics