I’m running into the following error when trying to compile Cuda and I’m looking for some advice.
[leiderml@ebwilson-mpi ~]$ pgfortran -Mcuda ibe-25Cuda.f
ptxas error : Entry function ‘case8’ uses too much local data (0xbdec bytes, 0x4000 max)
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (ibe-25Cuda.f: 611)
PGF90/x86-64 Linux 11.5-0: compilation aborted
So basically it looks like I’m WAY over the memory usage. My module which I’m sending to the GPU has around 500 lines of code between all the functions. So I’m thinking the arrays are what is causing the problem. Although I think this GPU should have enough memory to handle this.
My GPU is:
Device Name: Tesla C2050
Device Revision Number: 2.0
Global Memory Size: 2817982464
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
The declaration for case 8 is this: (and I’ll have some explanation afterwards on how I’m using it)
attributes(global) subroutine case8(m1n1SumAry,nsize,iStart,
- factrf,fact,m,n,p,q,s,a,b,c,d,ip2,jp2,kp2,lp2)
double precision, dimension(:) :: m1n1SumAry
double precision, dimension(0:500) :: factrf
double precision, dimension(0:170) :: fact
integer, value :: nsize,m,n,p,q,s,ip2, -
jp2,kp2,lp2,iStart
integer m1,n1,p1,m1mn1,p1min
double precision p1term,p1sum,n1term,n1sum,threej,a,b,c,d
Now factrf and fact are arrays of constant doubles which I’m sending to the GPU. I could compute them on the GPU, but they’re still going to use the same space unless I tremendously slow down the code and recompute every factorial in that array every time it is needed.
m1n1SumAry is the variable size array that is whatever number of threads I pass in to return the values. Perhaps hard-coding the size would help with memory constraints?
So basically my questions come down to:
-Is 500 lines of code between all the functions in my module too much? Or how much can I really fit in a module between code and variables?
-Is the number of variables a problem? Or is it just the size of all the arrays combined with the size of the code? Because after the compiler strips out all the comments and shrinks everything down to machine code I’d think this would fit on the GPU no problem.
*edited, but nobody else has commented yet