Hi,
I am using PGI workstation on a Fermi (C2050) box. In Fermi, the default memory configuration is 16 KB L1/48 KB shared memory, but it is also possible to change the configuration to 48 KB L1/16KB shared mem. How can it be done using PGI Fortran?
Hi,
I am using PGI workstation on a Fermi (C2050) box. In Fermi, the default memory configuration is 16 KB L1/48 KB shared memory, but it is also possible to change the configuration to 48 KB L1/16KB shared mem. How can it be done using PGI Fortran?
Hi,
Is there a support of Fermi architecture for PGI Fortran 10.9 ?
Is there a support of Fermi architecture for PGI Fortran 10.9 ?
Yes. Fermi support was first added to 10.6.
There is a function cudaFuncSetCacheConfig()
for change of L1 cache size for Fermi architecture.
Will this function support for Cuda Fortran ?
Hi Faustus,
You should be able to access any CUDA C function from CUDA Fortran. For CUDA C functions without a built-in CUDA Fortran interface, such as cudaFuncSetCacheConfig, you simply need to write an explicit interface to the function before calling it.
Dredging up an old thread…
Mat, et al,
I’m trying to try and use cudaFuncSetCacheConfig to see if it affects my code at all or not (on my way to maybe trying cudaThreadSetCacheConfig). So, I first tried whipping up an interface:
module extracuda
interface
integer function cudafuncsetcacheconfig(func, cacheconfig) bind(c,name='cudaFuncSetCacheConfig')
use iso_c_binding
character(len=*) :: func
integer :: cacheconfig
end function cudafuncsetcacheconfig
end interface
end module extracuda
And then in my code I did:
use extracuda
...
write (*,*) "Got here!"
istat = cudaFuncSetCacheConfig('soradcuf',2)
call soradcuf<<<dimGrid, dimBlock>>>(...args...)
istat = cudaGetLastError()
if (istat /= 0) then
write (*,*) "Kernel Call failed: ", cudaGetErrorString(istat)
stop
end if
where I’m pretty sure cudaFuncCachePreferL1 = 2 in CUDA enum speak.
I linked everything up with the usual -lcudart and -lcuda and I get:
> ./runsorad-cudafor-flxy-lessconstants-funccache-DPvDPorig.exe
blocksize: 256
Current Device: 0
Device:Tesla M2070, 1147.0 MHz clock, 4096.0 MB memory.
Iteration: 1
eps: 1.0000000E-06
Current Device: 0
Got here!
Kernel Call failed:
invalid device function
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
FORTRAN STOP
Now, if I comment out my cudaFuncSetCacheConfig call, the code works just fine, so I broke something with the interface (as per usual when I try to do C interfacing).
Any help from the gurus out there of what I might have screwed up?
Hi Matt,
In 11.4 we added an interface to cudaFuncSetCacheConfig in the cudafor module. You can now call the routine directly.
Though, the error suggests that cudaFuncSetCacheConfig is getting called but is using an unknown kernel. While I don’t know details, in looking at our wrapper function it seems we manipulate the Fortran string into a integer array which is then passed to the CUDA C cudaFuncSetCacheConfig function.
Can you try using the CUDA Fortran module’s interface and see if that works around the issue?
Thanks,
Mat
I’ll let you know soon (the Fermi system is getting 11.5 installed on it as soon as the sysadmins have the time)!
Thanks,
Matt
Okay, the compiler recognizes the call, which is good, but it crashes in varying ways trying to use it. First off, the only difference between the working code and the non-working code with the cudaFuncSetCacheConfig call is:
write (*,*) "Got here!"
istat = cudaFuncSetCacheConfig(soradcuf,cudaFuncCachePreferL1)
Following the cudaFuncSetCacheConfig call is my kernel call to soradcuf:
call soradcuf<<<dimGrid, dimBlock>>> (...args...)
istat = cudaGetLastError()
if (istat /= 0) then
write (*,*) "Kernel Call failed: ", cudaGetErrorString(istat)
stop
end if
When I compile that (with no errors) and run it I get:
> ./runsorad-cudafor-flxy-lessconstants-funccache-DPvDPorig.exe
blocksize: 256
Current Device: 0
Device:Tesla M2070, 1147.0 MHz clock, 4096.0 MB memory.
Iteration: 1
eps: 1.0000000E-06
Current Device: 0
Got here!
Segmentation fault (core dumped)
Hmm. This cored out before even getting to the error write under the kernel call. Maybe it needs strings and integers? Let’s try:
istat = cudaFuncSetCacheConfig('soradcuf',2)
which leads to:
> ./runsorad-cudafor-flxy-lessconstants-funccache-DPvDPorig.exe
blocksize: 256
Current Device: 0
Device:Tesla M2070, 1147.0 MHz clock, 4096.0 MB memory.
Iteration: 1
eps: 1.0000000E-06
Current Device: 0
Got here!
Kernel Call failed:
invalid device function
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
FORTRAN STOP
Well, that died out trying to call the function and got to the error write and stop.
Any hints?
Matt
Hi Matt,
I’m not sure what’s wrong. I haven’t used this function much but my little test cases seem fine. Instead of me trying to figure out how to recreate the error, can you send me your tests?
Thanks,
Mat
A tarball is whizzing its way to you!
Thanks for the help,
Matt
Hi Matt,
For the seg fault, you’re suppose to be able to pass in a function pointer and the CUDA runtime makes the association, but I don’t think we’re do this correctly. I add TPR#17878 and sent it to our engineers.
For the ‘invalid device function’ error, this is because the function name is actually ‘soradmod_soradcuf’, not ‘soradcuf’ since it’s in a module. It seems to be happy if I use:
istat = cudaFuncSetCacheConfig(‘soradmod_soradcuf’,cudaFuncCachePreferL1)
Can you give this a try and see if it works for you?