Very slow texture reads.

redwizz · June 10, 2010, 11:18pm

Below is a code snippet that contains a kernel and the code involved with setting up textures. The kernel is called multiple times ranging from 0 … d. Each time the kernel is called, it compares each pixel in the left image with the corresponding pixel in the right image offset by 0 … d.

I tried binding the data from the two sub images and using tex2D() in the hopes of getting some speed up, but instead the texture reads seem to slow things down. Am I setting up the textures incorrectly, or is this just not a good place to use textures? (I can’t get most of the memory reads to be coalesced according to the cuda visual profiler, which is why I turned to texture memory)

dataL and dataR are float pointers that point to the image data I obtained using openCV. Likewise, step, width, and height were all obtained that way. layer is just some indexing calculation I do for the integralImg, which is a giant array that stores the differences between each pixel for each disparity value from 0 to d.

[codebox]//global variables

texture<float, 2, cudaReadModeElementType> leftTex;

texture<float, 2, cudaReadModeElementType> rightTex;

cudaArray *left_array, *right_array;

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc();

global void kernel_mat(float *integralImg, float *dataL, float *dataR, int win, int disp, int step, int width, int height, int layer){

int j = __umul24(blockIdx.x,blockDim.x) + threadIdx.x;

int i = __umul24(blockIdx.y,blockDim.y) + threadIdx.y;

float left, right;

if(i < height && j < width){

int sublayer = __umul24(i,step)+j;

if(j-disp < 0){

  integralImg[layer+sublayer] = 0.0;

}

else{

  //left = dataL[sublayer]; //This is what I was doing before I tried using textures

  //right = dataR[sublayer-disp];

  left = tex2D(leftTex, j, i);

  right = tex2D(rightTex, j-disp, i);

  integralImg[layer+sublayer] = fabs(left-right);

}

}

//Set up cudaArrays and bind the textures

cudaMallocArray(&left_array, &channelDesc, width, height));

cudaMallocArray(&right_array, &channelDesc, width, height));

cudaMemcpy2DToArray(left_array, 0, 0, dataL, step, step, height, cudaMemcpyHostToDevice);

cudaMemcpy2DToArray(right_array, 0, 0, dataR, step, step, height, cudaMemcpyHostToDevice);

cudaBindTextureToArray(leftTex, left_array);

cudaBindTextureToArray(rightTex, right_array); [/codebox]

Thanks.

Topic		Replies	Views
Why are texture memory reads slower than global reads even though it is being accessed spatially? CUDA Programming and Performance cuda	0	464	June 19, 2020
Decreased performance when using textures CUDA Programming and Performance	2	470	April 8, 2019
Texture memory fetch extremely slow CUDA Programming and Performance	13	3181	December 21, 2017
OpenCV Image loading in CUDA texture CUDA Programming and Performance	11	2397	October 12, 2021
why texture makes it slower? CUDA Programming and Performance	0	912	July 8, 2009
Output of 2D texture memory is zero CUDA Programming and Performance	9	1032	March 30, 2021
Kernel optimization problem : Texture memory CUDA Programming and Performance	0	2763	June 10, 2010
Problem with texture memory CUDA Programming and Performance	8	2470	March 2, 2012
RT video processing: Use texture fetches or not? question about using tecture cache CUDA Programming and Performance	4	3225	August 18, 2008
Reading data CUDA Programming and Performance	12	2719	July 18, 2011

Very slow texture reads.

Related topics