Basic CUDA texture issue

First, I’ll give a general description of my problem. In my application I have a 1D float texture of length n. This texture is called “depthBufferTex_d”. This texture is bound to a 1D cudaArray called “cu_array.” I also have a 1D float array in global memory of length n. This is called “depthBufferGlobal_d”. After every kernel call, I want to take the values stored in “depthBufferGlobal_d”, and put them in “depthBufferTex_d”. Does anyone know how to do this?

So far I have tried two approaches, but have had no luck.

I have my texture declared at file scope as follows:

[codebox]texture<float, 1, cudaReadModeElementType> depthBufferTex_d;[/codebox]

Here is the first way I am trying to do this. I am using a “device to device” copy.


void swapBuffers1( float* depthBufferGlobal_d, int width, int height, cudaArray* cu_array ){

//First, unbind the texture

cudaUnbindTexture( depthBufferTex_d );

//Copy data from device to device

cudaMemcpyToArray( cu_array , 0 , 0 , gData , width * height * sizeof(float) , cudaMemcpyDeviceToDevice );

//Now, finally, rebind the texture

cudaBindTextureToArray(depthBufferTex_d , cu_array );


The second way I am trying to do this has me copying data from the device back to the host, and then from the host back onto the device again! It still doesn’t work though :(


void swapBuffers2( float* depthBufferGlobal_d, int width, int height, cudaArray* cu_array ){

//First, unbind the texture

cudaUnbindTexture( depthBufferTex_d );

//Copy the buffer from global mem to the texture array


// First, copy back to the host

// Next, copy from host to texture

// then, deallocate host memory

//Allocate space on host

float* depthBuffer_temp_h;

depthBuffer_temp_h = (float*) malloc( sizeof(float) * width * height );

//Copy from device->host

cudaMemcpy( depthBuffer_temp_h , depthBufferGlobal_d , sizeof(float) * width * height , cudaMemcpyDeviceToHost );

//Copy from host->device

cudaMemcpyToArray( cu_array , 0 , 0 , depthBuffer_temp_h , sizeof(float)*width*height , cudaMemcpyHostToDevice );

//Free host mem

free( depthBuffer_temp_h );

//Now, finally, rebind the textuer

cudaBindTextureToArray(depthBufferTex_d , cu_array );


Note that I am using CUDA 2.2 on Windows XP with a GTX 280 and 4GB of RAM.

Also, I want to say thanks to the people who post on the forum regularly, this place is a great source of CUDA knowledge.


The device to device memcpy path is definitely the way to go, but I guess you meant to write

cudaMemcpyToArray( cu_array , 0 , 0 , depthBufferGlobal_d , width * height * sizeof(float) , cudaMemcpyDeviceToDevice );

instead of

cudaMemcpyToArray( cu_array , 0 , 0 , gData , width * height * sizeof(float) , cudaMemcpyDeviceToDevice );


Thanks for your help Nico! I had thought that the device to device thing was correct from the beginning, but after it wouldn’t work I tried the device to host to device solution.

As for the “gData” varaible I had, that was just a mistake I had from pasting my code into here and forgetting to rename variables. The code actually is:


void swapBuffers1( float* depthBufferGlobal_d, int width, int height, cudaArray* cu_array ){

//First, unbind the texture

cudaUnbindTexture( depthBufferTex_d );

//Copy data from device to device

cudaMemcpyToArray( cu_array , 0 , 0 , depthBufferGlobal_d , width * height * sizeof(float) , cudaMemcpyDeviceToDevice );

//Now, finally, rebind the texture

cudaBindTextureToArray(depthBufferTex_d , cu_array );


in my real code. Unfortunately, it still dosen’t seem to work :(

I’m going to try to recreate my problem in a smaller scale example so I can post the full code so that someone can help me track down this problem.



Hi again,

Did you check for errors? Also, where’s the channel descriptor for the texture? I believe you’re supposed to provide it to the cudaBindTextureToArray call.



I checked for errors at the end of the function by using the code:

[codebox]cout << cudaGetLastError( ) << endl;[/codebox]

and I didn’t get any errors.

As for the channel descriptor, I used one when I did the original texture binding. However, I tried to add in a the same descriptor to the call, and it still didn’t work. I also tried just using a cudaBindTexture(…) as opposed to a cudaBindTextureToArray(…) call and still had no luck.

Thanks again for you help Nico!


I think it is incorrect to bind 1D texture to 2D Array.
You should declare the texture as texture<float, 2, cudaReadModeElementType>
bind it to 2D array and access it using tex2D. How the memory of 2D array is organized is implementation dependent (might be that zig-zag pattern for better spatial locality). And texture<float, 1> assumes linear organization.

If you dont need filtering and normalized coordinates, use 1D texture bound to linear memory (allocated by cudaMalloc). Bind it directly to your depthBufferGlobal_d, don’t need no copying.

I also suspect 2D texture size is something limited, but don’t see it in the reference right now…