Error in computed solution (giving NaN values) while using collapse directive in OpenACC

MatColgrove · July 26, 2021, 7:03pm

Assuming that this is an extended version of the program you posted at: OpenACC: Best way to parallelize nested DO loops with data dependency between loops?

It probably has some impact but not much. With CUDA Unified memory (-gpu=managed), the data will only be copied when it’s changed on either the host or device. So in this case it only gets copied once the first time through the timestep loop. So you can hoist the data movement before the timestep using explicit data regions so it doesn’t get included in your timer, but the overall time would be about the same.

More likely the problem is the same as your other program in the “nblocks” is only one so there’s not enough work for the GPU. I used a nblock size of 128 and see significant speed-up, though I don’t know if that’s a reasonable size.

-Mat

Topic		Replies	Views
OpenACC 2.0 standard and nested loops Legacy PGI Compilers	6	10464	May 2, 2014
N-body problem nested loop with OpenAcc Legacy PGI Compilers	11	7322	March 26, 2013
Couple of questions (nested loops, loop bounds, etc.) Legacy PGI Compilers	17	25168	December 11, 2014
PGI and OpenACC - problem with collapse clause Legacy PGI Compilers	4	6788	May 21, 2014
understanding problems with acc directives. Legacy PGI Compilers	7	12732	May 3, 2010
Parallelizing a loop Legacy PGI Compilers	9	5549	March 1, 2016
Issue in openacc nested loop collapse nvc, nvc++ and nvfortran	2	72	October 29, 2024
OPENACC changes value of array Legacy PGI Compilers	12	9782	May 17, 2016
OpenACC diff between GPU + CPU codes Legacy PGI Compilers	5	4078	May 31, 2012
a 3 levels of loop Legacy PGI Compilers	1	2086	September 6, 2012

Error in computed solution (giving NaN values) while using collapse directive in OpenACC

Related topics