Can I specify vector length in a kernels region?

I do have automatic arrays.

to get it to run I already had to set:

setenv PGI_ACC_CUDA_HEAPSIZE 67000000

Is that the same thing?

I tried setting NV_ACC_CUDA_HEAPSIZE

from 67000000 to 500000000

But that did not fix it. I may try removing the automatic arrays.

Thanks,

Jacques

Yes, although the older “PGI” prefix is deprecated. “NVCOMPILER” is the of official prefix for environment variables, but I prefer the abbreviated “NV” which is also acceptable.

Hi Mat,

In subroutine mynn_tendencies I changed all 19 automatic arrays to arrays in the calling sequence and, in the calling routine, put them in a private clause. That sped up the entire main loop from 1.33 seconds to .90 seconds which is .68% of 1.33. Now I’m going to look for other subroutines with automatic arrays.

Thanks for the great tip!

Jacques

1 Like

Hi Mat,

I removed all the automatic array and it sped up by xx%. I don’t know what’s taking the remaining time but I wonder if it is private arrays. I have a kernels directive on the main loop that I time and that kernels directive specifies 180 private arrays, most dimensioned 128 and some dimensioned 128,10.

Hi Matt,

I removed all the automatic array and it sped up by 4X. I don’t know what’s taking the remaining time but I wonder if it is private arrays. I time the main loop. On the main loop I have a kernels directive which specifies 180 private arrays, most dimensioned (128) and some dimensioned (128,10). Does that take a lot of start-up time?

Thanks,

Jacques

Well, the private arrays do need to get allocated. Normally the overhead time is not significant, but 180 arrays could take awhile. I personally haven’t used this many. Granted the device memory should get re-used and the allocation time only impact the first time the kernel is called.

Have you profiled the code? If not, I suggest profiling using Nsight-Systems with OpenACC tracing enabled (i.e. “nsys profile -o -t cuda,openacc ”, optionally add “–stats=true” to see the text output). This will should the device memory allocation time.

-Mat