Question about texture memory

Tobi_W · May 22, 2009, 8:46am

Hi there,

I’m still unfamiliar with the texture memory and with my first texture memory usage some questions came to my mind:

1.:

It seems that cudaMemcpyToArray() is much much slower than a simple cudaMemcpy() without texture memory. If this is correct, can someone try to explain this behaviour?

2.:

I have 2 kernel functions which interact the following way

inputData -> kernel 1 -> intermediate result (resides on the device) -> kernel 2 -> final result (copied back)

Until now, inputData was stored in the global memory. Now it’s stored in the texture memory and kernel 1 speeded up a little bit. But surprisingly kernel 2 is now slower than before. The intermediate result is an extra array in the global memory, so I am not reusing the inputData array. Does someone have an idea why this change has an impact on kernel 2?

3.:

What is the difference between a CUDA array (1D) and linear memory bound to a texture reference?

Tobi_W · May 22, 2009, 1:31pm

I tested a little bit, and using CUDA arrays seems to be faster than using linear memory bound to texture. But the performance gain is ruined by cudaMemcpyToArray() (which I need to call many times).

Could someone explain that?

Thanks in advance…

MisterAnderson42 · May 25, 2009, 11:04am

This was true a while ago (CUDA 1.1 IIRC). It was reported as a bug and supposedly fixed. I haven’t checked recently, though. Memory in the cudaArray is stored in a special format, so presumably there is some translation that goes on when you do the memcpy.

2.:

I have 2 kernel functions which interact the following way
inputData -> kernel 1 -> intermediate result (resides on the device) -> kernel 2 -> final result (copied back)
Until now, inputData was stored in the global memory. Now it’s stored in the texture memory and kernel 1 speeded up a little bit. But surprisingly kernel 2 is now slower than before. The intermediate result is an extra array in the global memory, so I am not reusing the inputData array. Does someone have an idea why this change has an impact on kernel 2?

Are you calling cudaThreadSyncrhonize() before every wall-clock timing measurement?

Functionally, if you want to use the coordinate wrapping or linear interpolation features: you must use a CUDA array. That is the only advantage that CUDA arrays offer. So if you aren’t using them, stick with global memory bound to a texture.

Tobi_W · May 27, 2009, 12:55pm

Thanks for your reply, it helps a lot!

Today I had some time for testing. It seems that the performance difference between cudaMemcpyToArray() and cudaMemcpy() are not as big as I thought. But the transferred size seems to be significant for the copies:

Transferring more than 10 000 Bytes: cudaMemcpyToArray() is faster than cudaMemcpy().

Transferring less than 10 000 Bytes: cudaMemcpyToArray() is slower than cudaMemcpy().

But as I mentioned, the difference is not so big: I measured max. 1-3 %. The fetching-time (CUDA Array <-> linear memory) was nearly the same…

Topic		Replies	Views
Avoiding a device write using textures and arrays. CUDA Programming and Performance	3	2795	August 7, 2008
cudaBindTexture2D vs cudaBindTextureToArray CUDA Programming and Performance	4	11385	October 24, 2009
Textures: linear memory vs cudaArrays CUDA Programming and Performance	9	7823	October 16, 2007
Memory performance in image processing example CUDA Programming and Performance	9	1617	March 24, 2011
Linear Memory Vs CUDA Array for texture binding CUDA Programming and Performance	2	17676	January 21, 2010
CUDA vs. OpenGL textures read-only vs. read-write CUDA Programming and Performance	5	10936	March 30, 2007
dumb, newbie CUDA texture question CUDA Programming and Performance	2	27968	June 29, 2007
CUDA texture memory performance CUDA Programming and Performance	4	33585	January 13, 2009
cudaBindTexture2D problem CUDA Programming and Performance	3	11767	August 3, 2010
Textures CUDA Programming and Performance	2	1649	July 22, 2008

Question about texture memory

Related topics