Im a newbie with CUDA so assistance is greatly appreciated - especially when directed at my level of understanding!
I have an old serial code am trying to port to a GTX275. It runs fine on CPU but when port just one (very) compute intensive loop I get:
“call to cuMemAlloc returned error 2: Out of Memory
CUDA driver version: 4000”
Array used in loop is ~4,000,000 elements (4byte)
To make life simpler I wrote a few line matrix multiplication program (different to above) and increased the size till reproduced the same error. array(3000,3000,3) worked, but array(4000,4000,3) returned the same error as quoted above. The screen also temporarily blacked out on the latter…
Using GPUZ it looks like I should easily have sufficient memory and in small problems (3000 one) GPU load goes up to 100% for a second or 2)
Does anyone know what could be wrong or what I can do to fix it?
For Info I am working in Fortran with the Portland Visual Fortran compiler with accelerator directives (not CUDA in C). I dont believe this is where the problem lies but am open to suggestions.
If anyone here uses/or is thinking of using the PG compiler I would recomend it. Has made life much easier for me while learning.