CUDA arrays vs pitch linear texturing

Hi all,

looking for your opinions on this:

I’m writing an image processing pipeline and I would like to use CUDA’s texturing features. The question is if I should cudaArray’s or pitch linear texturing.

Looking at the simplePitchLinearTexture example on my 9800 GT gives me these results implying cudaArray’s are faster:

[simplePitchLinearTexture.exe] starting...

Bandwidth (GB/s) for pitch linear: 4.06e+001; for array: 4.29e+001

Texture fetch rate (Mpix/s) for pitch linear: 5.07e+003; for array: 5.36e+003

[simplePitchLinearTexture.exe] test results...

PASSED

I cannot write to cudaArrays from kernels on my old hardware. I want to use texturing in future kernels so I need to use cudaMemcpy2DToArray to prepare the next texture. Modifying simplePitchLinearTexture to include the time of the copy changes the numbers significantly:

[simplePitchLinearTexture.exe] starting...

Bandwidth (GB/s) for pitch linear: 4.08e+001; for array: 2.18e+001

Texture fetch rate (Mpix/s) for pitch linear: 5.10e+003; for array: 2.72e+003

Seems like the copy really costs a lot. I guess if the kernel was doing something more complex the copy time would take a smaller fraction.

Am I missing something in my analysis? Seems like linear pitch texturing is the way to go.