Change L1 cache size in Fermi

pmitra1 · November 16, 2010, 7:45pm

Hi,

I am using PGI workstation on a Fermi (C2050) box. In Fermi, the default memory configuration is 16 KB L1/48 KB shared memory, but it is also possible to change the configuration to 48 KB L1/16KB shared mem. How can it be done using PGI Fortran?

Faustus · December 10, 2010, 1:30pm

Hi,

Is there a support of Fermi architecture for PGI Fortran 10.9 ?

MatColgrove · December 10, 2010, 10:07pm

Is there a support of Fermi architecture for PGI Fortran 10.9 ?

Yes. Fermi support was first added to 10.6.

Mat

Faustus · December 13, 2010, 10:37am

There is a function cudaFuncSetCacheConfig()
for change of L1 cache size for Fermi architecture.

Will this function support for Cuda Fortran ?

MatColgrove · December 17, 2010, 10:38pm

Hi Faustus,

You should be able to access any CUDA C function from CUDA Fortran. For CUDA C functions without a built-in CUDA Fortran interface, such as cudaFuncSetCacheConfig, you simply need to write an explicit interface to the function before calling it.

Mat

TheMatt · May 19, 2011, 4:45pm

Dredging up an old thread…

Mat, et al,

I’m trying to try and use cudaFuncSetCacheConfig to see if it affects my code at all or not (on my way to maybe trying cudaThreadSetCacheConfig). So, I first tried whipping up an interface:

module extracuda

interface
   integer function cudafuncsetcacheconfig(func, cacheconfig) bind(c,name='cudaFuncSetCacheConfig')
      use iso_c_binding

      character(len=*) :: func
      integer :: cacheconfig
   end function cudafuncsetcacheconfig
end interface

end module extracuda

And then in my code I did:

      use extracuda
...
      write (*,*) "Got here!"

      istat = cudaFuncSetCacheConfig('soradcuf',2)

      call soradcuf<<<dimGrid, dimBlock>>>(...args...)
      istat = cudaGetLastError()
      if (istat /= 0) then
         write (*,*) "Kernel Call failed: ", cudaGetErrorString(istat)
         stop
      end if

where I’m pretty sure cudaFuncCachePreferL1 = 2 in CUDA enum speak.

I linked everything up with the usual -lcudart and -lcuda and I get:

> ./runsorad-cudafor-flxy-lessconstants-funccache-DPvDPorig.exe 
 blocksize:           256
 Current Device:             0

Device:Tesla M2070, 1147.0 MHz clock, 4096.0 MB memory.

 Iteration:             1
 eps:    1.0000000E-06
 Current Device:             0
 Got here!
 Kernel Call failed: 
 invalid device function                                                                                                         
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
FORTRAN STOP

Now, if I comment out my cudaFuncSetCacheConfig call, the code works just fine, so I broke something with the interface (as per usual when I try to do C interfacing).

Any help from the gurus out there of what I might have screwed up?

MatColgrove · May 20, 2011, 10:51pm

Hi Matt,

In 11.4 we added an interface to cudaFuncSetCacheConfig in the cudafor module. You can now call the routine directly.

Though, the error suggests that cudaFuncSetCacheConfig is getting called but is using an unknown kernel. While I don’t know details, in looking at our wrapper function it seems we manipulate the Fortran string into a integer array which is then passed to the CUDA C cudaFuncSetCacheConfig function.

Can you try using the CUDA Fortran module’s interface and see if that works around the issue?

Thanks,
Mat

TheMatt · May 23, 2011, 7:21pm

I’ll let you know soon (the Fermi system is getting 11.5 installed on it as soon as the sysadmins have the time)!

Thanks,
Matt

TheMatt · May 24, 2011, 11:36am

Okay, the compiler recognizes the call, which is good, but it crashes in varying ways trying to use it. First off, the only difference between the working code and the non-working code with the cudaFuncSetCacheConfig call is:

write (*,*) "Got here!"

istat = cudaFuncSetCacheConfig(soradcuf,cudaFuncCachePreferL1)

Following the cudaFuncSetCacheConfig call is my kernel call to soradcuf:

call soradcuf<<<dimGrid, dimBlock>>> (...args...)
istat = cudaGetLastError()
if (istat /= 0) then
   write (*,*) "Kernel Call failed: ", cudaGetErrorString(istat)
   stop
end if

When I compile that (with no errors) and run it I get:

> ./runsorad-cudafor-flxy-lessconstants-funccache-DPvDPorig.exe 
 blocksize:           256
 Current Device:             0

Device:Tesla M2070, 1147.0 MHz clock, 4096.0 MB memory.

 Iteration:             1
 eps:    1.0000000E-06
 Current Device:             0
 Got here!
Segmentation fault (core dumped)

Hmm. This cored out before even getting to the error write under the kernel call. Maybe it needs strings and integers? Let’s try:

istat = cudaFuncSetCacheConfig('soradcuf',2)

which leads to:

> ./runsorad-cudafor-flxy-lessconstants-funccache-DPvDPorig.exe
 blocksize:           256
 Current Device:             0

Device:Tesla M2070, 1147.0 MHz clock, 4096.0 MB memory.

 Iteration:             1
 eps:    1.0000000E-06
 Current Device:             0
 Got here!
 Kernel Call failed: 
 invalid device function
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
FORTRAN STOP

Well, that died out trying to call the function and got to the error write and stop.

Any hints?

Matt

MatColgrove · May 24, 2011, 5:48pm

Hi Matt,

I’m not sure what’s wrong. I haven’t used this function much but my little test cases seem fine. Instead of me trying to figure out how to recreate the error, can you send me your tests?

Thanks,
Mat

TheMatt · May 24, 2011, 6:06pm

A tarball is whizzing its way to you!

Thanks for the help,
Matt

MatColgrove · May 25, 2011, 11:11pm

Hi Matt,

For the seg fault, you’re suppose to be able to pass in a function pointer and the CUDA runtime makes the association, but I don’t think we’re do this correctly. I add TPR#17878 and sent it to our engineers.

For the ‘invalid device function’ error, this is because the function name is actually ‘soradmod_soradcuf’, not ‘soradcuf’ since it’s in a module. It seems to be happy if I use:

istat = cudaFuncSetCacheConfig(‘soradmod_soradcuf’,cudaFuncCachePreferL1)

Can you give this a try and see if it works for you?

Mat

Topic		Replies	Views
changing L1 cache configuration using â€œcudaFuncSetCacheConfig" not working CUDA Programming and Performance	6	4579	February 3, 2012
Reconfiguring the cache / shared memory on a Fermi understanding the cudaFuncSetCacheConfig command CUDA Programming and Performance	19	34928	June 7, 2010
cudaFuncSetCacheConfig and CUDA 5.0 Legacy PGI Compilers	5	10243	April 24, 2013
How to use cudaFuncSetCacheConfig() correctly ? One of the most anticipating features does not seem CUDA Programming and Performance	8	5702	June 23, 2010
choose 16/48KB vs. 48/16KB Legacy PGI Compilers	1	2991	December 4, 2010
Set cache config from OpenCL CUDA Programming and Performance	0	1430	September 10, 2011
cudaFuncSetCacheConfig( Kernel1, cudaFuncCachePreferL1) No effect on shared memory CUDA Programming and Performance	1	1936	January 30, 2012
Cuda Portability and SharedMem vs Cache CUDA Programming and Performance	9	11761	October 18, 2010
Fermi-style L1 cache in K40 and upwards Legacy PGI Compilers	1	4684	March 20, 2015
Declaring local arrays in device code Legacy PGI Compilers	16	9225	June 8, 2012

Change L1 cache size in Fermi

Related topics