Hello,
using NVIDIA’s CUDA C, there are built-in vector data types as float3 and float4 (which promise good memory access pattern and alignment, as far as I know).
Does CUDA Fortran have analogous derived types?
If I do it manually (see below), then I don’t know how to ensure correct alignment…
type :: float4
sequence
real*4:: x,y,z,w
end type float4
No, CUDA Fortran does support these vector types. Though since Fortran allows you to perform operation on whole arrays, I’m wondering if they are necessary. Wouldn’t declaring a 3 or 4 element array work?
The vector data types (float3, float4) are important when programming
with OpenCL for the ATI, but aren’t needed for good performance on
NVIDIA. They are used in CUDA mostly for texture and surface references.
We don’t have an analog to these vector data types in CUDA Fortran.
Hi,
Thanks for the information. Two more comments from my side. We used the float4 type with CUDA C and got better performance on several NVIDIA GPUs (although not Fermi, I believe). I think, float4 types are aligned which could give a better performance in some cases. If I use Fortran’s array operations instead, it should be almost the same, but what is about this alginment in Fortran?
Thus, I think, there could be differences.
Bye, Sandra
I would guess nvcc could put all four elements in one segment continuously if four threads will access the four elements. If one thread will access all four element, nvcc could put the four elements in the same location but in four memory segments. In fortran, user has to do the memory management.
Say if there are 10 ponts, we could do a my_pnt(1:10)%xyzw(1:4) style
or x(1:10), y(1:10), z(1:10) and w(1:10) style, depending on how you plan to access them.