I am using PGI workstation on a Fermi (C2050) box. In Fermi, the default memory configuration is 16 KB L1/48 KB shared memory, but it is also possible to change the configuration to 48 KB L1/16KB shared mem. How can it be done using PGI Fortran?
You should be able to access any CUDA C function from CUDA Fortran. For CUDA C functions without a built-in CUDA Fortran interface, such as cudaFuncSetCacheConfig, you simply need to write an explicit interface to the function before calling it.
I’m trying to try and use cudaFuncSetCacheConfig to see if it affects my code at all or not (on my way to maybe trying cudaThreadSetCacheConfig). So, I first tried whipping up an interface:
integer function cudafuncsetcacheconfig(func, cacheconfig) bind(c,name='cudaFuncSetCacheConfig')
character(len=*) :: func
integer :: cacheconfig
end function cudafuncsetcacheconfig
end module extracuda
And then in my code I did:
write (*,*) "Got here!"
istat = cudaFuncSetCacheConfig('soradcuf',2)
call soradcuf<<<dimGrid, dimBlock>>>(...args...)
istat = cudaGetLastError()
if (istat /= 0) then
write (*,*) "Kernel Call failed: ", cudaGetErrorString(istat)
where I’m pretty sure cudaFuncCachePreferL1 = 2 in CUDA enum speak.
I linked everything up with the usual -lcudart and -lcuda and I get:
Current Device: 0
Device:Tesla M2070, 1147.0 MHz clock, 4096.0 MB memory.
Current Device: 0
Kernel Call failed:
invalid device function
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
Now, if I comment out my cudaFuncSetCacheConfig call, the code works just fine, so I broke something with the interface (as per usual when I try to do C interfacing).
Any help from the gurus out there of what I might have screwed up?
In 11.4 we added an interface to cudaFuncSetCacheConfig in the cudafor module. You can now call the routine directly.
Though, the error suggests that cudaFuncSetCacheConfig is getting called but is using an unknown kernel. While I don’t know details, in looking at our wrapper function it seems we manipulate the Fortran string into a integer array which is then passed to the CUDA C cudaFuncSetCacheConfig function.
Can you try using the CUDA Fortran module’s interface and see if that works around the issue?
Okay, the compiler recognizes the call, which is good, but it crashes in varying ways trying to use it. First off, the only difference between the working code and the non-working code with the cudaFuncSetCacheConfig call is:
For the seg fault, you’re suppose to be able to pass in a function pointer and the CUDA runtime makes the association, but I don’t think we’re do this correctly. I add TPR#17878 and sent it to our engineers.
For the ‘invalid device function’ error, this is because the function name is actually ‘soradmod_soradcuf’, not ‘soradcuf’ since it’s in a module. It seems to be happy if I use: