Texture memory + Thrust Library.

Hi. I have an algorithm, coded in PGI CUDA Fortran that is delivering a nice speeup of order 50. However, I can clearly identify the rate determining step as being due to the non contiguous reading of a field map by my millions of particles. Essentially, the particles rapidly get scrambled up in space due to motion in a magnetic field, scattering etc.

I am faced with the following question - do I attempt to periodically re-order my particles using a GPU sort kernel in order to try and maintain coallesced access of my field map (which is static, i.e. does not evolve with time)? I have read various papers where developers have done this and achieved a suitable speed-up.

More promising however would be if I could set my field map to reside in texture memory (as read-only is OK). Many people with similar problems have seen a big speed up by doing this.

My question then is this - when can we expect texture memory to be made available in PGI cuda-fortran? An earlier post suggests this year - are your engineers on track with this or have other developments bumped this task down their list? Would I be better recoding my kernel in c ?

Any info greatly appreciated as always. Also, any advice on GPU sorting algorithms for doing what I suggest above would be very useful - I recently came across the Thrust library for example which looks like it contains some useful operators (including some routines which might be useful for sum reductions).

Rob.

Hi Rob,

My question then is this - when can we expect texture memory to be made available in PGI cuda-fortran?

I asked engineer and currently textured memory is second on their list for new features behind shared memory using automatic arrays that overlay the dynamic area. While priorities and time lines can and do change, the hope is to have texture memory support in November’s 2012 (12.0) release.

Thanks,
Mat

Great. My personal view is that texture memory is certainly one of the most useful missing features in cuda fortran when compared with cuda c. I have met a number of developers who have seen very dramatic speed increases by using texture memory.

I’ll look forward to the November realease and keep my fingers crossed that it’s included.

Thanks for the update,

Rob.