Compiler error with 21.3 and OpenACC

orestis.kor · April 21, 2021, 12:06pm

Hello,

after upgrading from 21.2, I get an obscure compiler error, with code that has been working for quite a long while (PGI 19.10 to 21.2, with no issues AFAIK):

NVC++-F-0000-Internal compiler error. mr_precedes: too many st/br 31

The compiler then reports a line that is actually the closing } of a particular function that has some OpenACC regions. I am not the developer of the application so I don’t know the exact rationale of the function design, but I will provide the code here. If I comment out the acc pragmas in this particular one (the application has multiple computational OpenACC regions, async etc. that don’t seem to get in the way) compilation continues, until it breaks again in another function, very similar to this one. As the exact same code compiles and works with versions up to 21.2 I suspect there has been some new behavior in 21.3 that introduces this bug. The whole program uses the Unified Memory feature, and that’s why you’ll notice quite some pointer chasing on the device code (as it’s adapted from a non-heterogeneous version).

Function code

Compiler reports:
NVC++-F-0000-Internal compiler error. mr_precedes: too many st/br 31 (kernel_particles.c: 330)

So here is the corresponding part of kernel_particles.c

// Organize the particles in tiles (Bucket Sort)
void spec_organize_in_tiles(t_species *spec, const int limits_y[2], const int device)
{
	int iy, ix;

	const int size = spec->main_vector.size;
	const int n_tiles_x = spec->n_tiles_x;
	const int n_tiles_y = spec->n_tiles_y;

	int *restrict tile_offset = spec->tile_offset;
	int *restrict pos = alloc_align_buffer(DEFAULT_ALIGNMENT, size * sizeof(int));

#ifdef ENABLE_PREFETCH
		spec_prefetch_openacc(&spec->main_vector, device, NULL);
		cudaMemPrefetchAsync(spec->tile_offset, (n_tiles_x * n_tiles_y + 1) * sizeof(int), device, NULL);
		cudaMemPrefetchAsync(pos, size * sizeof(int), device, NULL);
#endif

	// Calculate the histogram (number of particles per tile)
	#pragma acc parallel loop private(ix, iy)
	for (int i = 0; i < size; i++)
	{
		ix = spec->main_vector.ix[i] / TILE_SIZE;
		iy = (spec->main_vector.iy[i] - limits_y[0]) / TILE_SIZE;

		#pragma acc atomic capture
		pos[i] = tile_offset[ix + iy * n_tiles_x]++;
	}

	// Prefix sum to find the initial idx of each tile in the particle vector
	prefix_sum_openacc(tile_offset, n_tiles_x * n_tiles_y + 1);

	// Calculate the target position of each particle
	#pragma acc parallel loop private(ix, iy)
	for (int i = 0; i < size; i++)
	{
		ix = spec->main_vector.ix[i] / TILE_SIZE;
		iy = (spec->main_vector.iy[i] - limits_y[0]) / TILE_SIZE;

		pos[i] += tile_offset[ix + iy * n_tiles_x];
	}

	const int final_size = tile_offset[n_tiles_x * n_tiles_y];
	spec->main_vector.size = final_size;

	// Move the particles to the correct position
	spec_move_vector_int_full(spec->main_vector.ix, pos, size);
	spec_move_vector_int_full(spec->main_vector.iy, pos, size);
	spec_move_vector_float_full(spec->main_vector.x, pos, size);
	spec_move_vector_float_full(spec->main_vector.y, pos, size);
	spec_move_vector_float_full(spec->main_vector.ux, pos, size);
	spec_move_vector_float_full(spec->main_vector.uy, pos, size);
	spec_move_vector_float_full(spec->main_vector.uz, pos, size);

	// Validate all the particles
		#pragma acc parallel loop
		for (int k = 0; k < final_size; k++)
			spec->main_vector.invalid[k] = false;

	free_align_buffer(pos);  // Clean position vector
} // This is line 330 from the compiler error

As I can’t decipher the compiler error, I thought it’d be interesting for someone to take a look. There’s obviously some context (definitions etc.) missing, but as this was working till now maybe it’s not even relevant. I’ll fill in those informational gaps as needed.
Thanks!

MatColgrove · April 21, 2021, 4:41pm

It’s an internal compiler error (ICE) so a compiler issue, not an issue with the code.

Would you be able to provide a small but complete reproducing example that recreates the issue? Unfortunately, the code snip-it you provide is missing definitions so I can’t compile it. Once I have a reproducer, I’ll send a report to our compiler engineers so we can get the issue resolved.

Thanks,
Mat

orestis.kor · April 21, 2021, 5:00pm

Hi Mat, thanks for answering.

I tried writing a dummy (with pointer chasing and assignments in a loop with private locals) before posting but I don’t know what it is that triggers the bug -and it certainly doesn’t look obvious- so I was not able to reproduce it in a PoC. Any ideas?

Maybe ‘too many st/br’ may indicate (to you that can hopefully find out what it means) what we need to reproduce it?

In the meantime, the full code is actually open and accessible:

In the master branch, go to parallel/openacc_multigpu/ and simply make.

Cheers
Orestis

MatColgrove · April 21, 2021, 5:50pm

Thanks Orestis. I was able to reproduce the issue here and have filed a problem report (TPR #29945).

Looks to be an issue with our new atomic support in 21.3 (I’ve seen a few other codes with similar issues). You can work around it by adding the internal compiler flag “-Mx,231,0x01” to revert to using the older atomics. Though be sure to remove this flag later once we fix this problem.

-Mat

MatColgrove · June 8, 2021, 6:55pm

Hi Orestis,

Engineering let me know that this issue, TPR #29945, has been fixed in the 21.5 release.

Example:

Fails with 21.3:

% pgcc -c kernel_particles.c -o kernel_particles.o -O3 -fast -cudalibs -DTEST -Iinclude/ -lm -acc -gpu=managed -V21.3 -w
NVC++-F-0000-Internal compiler error. mr_precedes: too many st/br      31  (kernel_particles.c: 330)
NVC++/x86-64 Linux 21.3-0: compilation aborted

Compiles correctly with 21.5:

% pgcc -c kernel_particles.c -o kernel_particles.o -O3 -fast -cudalibs -DTEST -Iinclude/ -lm -acc -gpu=managed -V21.5 -w
%

orestis.kor · June 15, 2021, 11:57am

Thanks for the update, Mat!

I can confirm that after upgrading my installations to 21.5 it looks like working again.

Topic		Replies	Views
Compiler error with 21.5 and OpenACC nvc, nvc++ and nvfortran	7	802	July 22, 2021
NV 21.3 fails to compile my OpenACC code nvc, nvc++ and nvfortran	7	1336	June 4, 2021
Compiler failed to translate accelerator region (see -Minfo messages): Unexpected address of constant nvc, nvc++ and nvfortran	10	1746	June 3, 2021
Incorrect GPU results with #pragma acc atomic capture nvc, nvc++ and nvfortran	11	1385	February 27, 2023
Problem with OpenAcc and CPP STL nvc, nvc++ and nvfortran cuda	17	775	January 26, 2024
How to preprocess source file for a bug submission? (C++/CUDA/OpenMP) nvc, nvc++ and nvfortran hpc	4	530	January 22, 2024
NVC++-W-0155-External and Static variables are not supported in acc routine, and break problem Legacy PGI Compilers kernel	1	1489	October 26, 2020
Problem with the nvc++ compiler for OpeMP GPU offloading nvc, nvc++ and nvfortran	2	585	March 10, 2023
pgcc compile error in OpenACC-CUDA interoperabily example Legacy PGI Compilers	5	13280	June 14, 2016
Improving compiler error with OpenACC + OpenMP: "Internal compiler error. confused OMP private processing" nvc, nvc++ and nvfortran	1	441	October 18, 2021

Compiler error with 21.3 and OpenACC

Related topics