2D Texture access How can I access pixels from 2D texture

now ! finally I got it ! Thanks a lot laughingrice !!, only thing which I had to figure out was to set up cudaMemcpy2D and cudaMallocPitch (). I put everywhere where I had to allocate memory 3 (for RGB image ).

But one thing what I don’t understand are references to RGB channels in this example:

text2D( texRef, idx + 0, idy )

text2D( texRef, idx + 1, idy )

text2D( texRef, idx + 2, idy )

does it mean that, I’m referring to RGB channels in pitch memory, which was allocated on device, but I’m accessing it through texture memory ? Is it right ? Is it mapping pitch -> texture ?

Thanks in advance !

ohhhh no ! … I’ve just realized that for another image it doesn’t work :/ properly. For lena.jpg ( 512x512 ) works but for example for 300x300 doesn’t (see attch.) - I getting some strange black lines across the picture - what is wrong ?


You are not handling the pitch correctly. From the diagonal black lines you are passing image width in bytes to the texture instead of image pitch in bytes

How do you load the image on the CPU side (if it’s OpenCV then you have image pitch on that side as well)

The idea is like this:

The image is width (RGB) elements by height lines. The buffer is pitch bytes width by height lines. So if I’ll manage to do it in text




width would be 4, height 3, pitch 14

The pitch idea is that you want the start of each line to be aligned to get read coalescing on each line

Look at the syntax of the functions

cudaMallocPitch(&buf, &bufPitch, width (in bytes), height)

cudaMemcpy2D(buf, bufPitch, host, hostPitch, width (in bytes), height, copy type)

where if the host doesn’t have a pitch you use hostPitch = line width in bytes. If you use OpenCV hostPitch = img->widthStep and not img->width*3

cudaBindTexture2D(NULL, &texRef, buf, &desc, width*3, height, bufPitch);

notice the use of both width*3 and bufPitch

hi laughingrice,

sorry I did’t answer so long (I’ve been dealing with global memory … ) so, I very thankful for your answers, now I got it, at least , what pitch referred. You never know I did you help me! But Unfortunately I still get this strange stripes (!) … I tried everything what could have influence on it… I did after your advices: this is what I have:

// ------------------ load image ----------------  

   loadImage( &src_img, cv_img );

// ----------------------------------------------

uchar *d_buff;

   size_t pitch;

   const unsigned int d_pitch = imgW * sizeof(uchar) * CHANNELS;

   const unsigned int img_pitch = imgW * sizeof(uchar) * CHANNELS;

	  cutilSafeCall( cudaMallocPitch( &d_buff, &pitch, img_pitch, imgH ) );

	  cutilSafeCall( cudaMemcpy2D( d_buff, pitch, src_img, cv_img->widthStep,imgW*CHANNELS*sizeof(uchar), imgH, cudaMemcpyHostToDevice ) ); 

cutilSafeCall( cudaBindTexture2D( NULL, texRef, d_buff, channelDesc, imgW*CHANNELS*type_size , imgH, pitch ) );


// allocate device momory for results 

	uchar *d_out;

	  cutilSafeCall( cudaMalloc( (void **)&d_out, mem_size ) );

	  cutilSafeCall( cudaMemset( (void *)d_out, 0, mem_size ) ); // fill with 0s

	  copy_image_kernel( d_out, pitch, imgW, imgH ); 

// allocate host memory for results 

   uchar *h_out;

   cutilSafeCall( cudaMalloc( (void **)&h_out, mem_size ) );

// get results from the device 

	  cutilSafeCall( cudaMemcpy2D( h_out, img_pitch, d_out, d_pitch, imgW * sizeof(uchar) * CHANNELS , imgH, cudaMemcpyDeviceToHost) );

Could you help me again … I’ll be very thankful …Thanks in advance

Afraid that I’m a bit short on time this week, so hope that this is good enough, if not, I’ll try something more complete in a week

In any case, I’m guessing that the input image does not have an alpha channel if it is the input of OpenCV

jpeg never has, png can have an alpha channel

I didn’t test the 4 channel output image, don’t know if OpenCV supports it

I’m also writing this ad-hoc so I hope that there are not too many compiler errors


texture<unsigned char, 2, cudaReadModeElementType> texRef;

global void CopyKernel (uchar4 *gpuOut, size_t gpuoutStride, int width, int height, int nChannels)


    int x = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;

    int y = __umul24(blockIdx.y, blockDim.y) + threadIdx.y;

if (x >= width || y >= height)


int ix = x*nChannels;

uchar4 val = make_uchar4(0,0,0,0);

val.x = tex2D(texRef, ix, y);

if (nChannels >= 2) val.y = tex2D(texRef, ix + 1, y);

if (nChannels >= 3) val.z = tex2D(texRef, ix + 2, y);

if (nChannels >= 4) val.w = tex2D(texRef, ix + 3, y);

*((uchar4 *)((char )gpuOut + ygpuoutStride) + x) = val


IplImage* img = cvLoadImage(“img.jpg”, CV_LOAD_IMAGE_ANYCOLOR);

if (!img || img->depth != 8)


// Create an rgba output image

IplImage* oimg = cvCreateImage(cvSize(img->width, img->height), img->depth, 4);

unsigned char *gpuIn;

size_t gpuinStride;

cudaMallocPitch((void **)&gpuIn, &gpuinStride, img->width*img->nChannels, img->height);

// 4 channel output

unsigned uchar4 *gpuOut;

size_t gpuoutStride;

cudaMallocPitch((void **)&gpuOut, &gpuoutStride, oimg->width*4, oimg->height);

cudaMemcpy2D(gpuIn, gpuinStride, img->imageData, img->widthStep, img->width*img->nChannels, img->height, cudaMemcpyHostToDevice);

cudaChannelFormatDesc desc = cudaCreateChannelDesc();

cudaBindTexture2D (NULL, &texRef, gpuIn, &desc, img->width*img->nChannels, img->height, gpuinStride);

dim3 dimBlock(16, 16);

dim3 dimGrid((img->width + dimBlock.x - 1)/dimBlock.x, (img->height + dimBlock.y - 1)/dimBlock.y);

CopyKernel <<< dimGrid, dimBlock >>> (gpuOut, gpuoutStride, img->width, img->height, img->nChannels);

cudaMemcpy2D(oimg->imageData, oimg->widthStep, gpuOut, gpuoutStride, oimg->width*oimg->nChannels, oimg->height, cudaMemcpyHostToDevice);


if the input is RGBA, i.e. img->nChannels == 4, you can simplify this by using a uchar4 texture, i.e


texture<uchar4, 2, cudaReadModeElementType> texRef;

global void CopyKernel (uchar4 *gpuOut, size_t gpuoutStride, int width, int height, int nChannels)


    int x = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;

    int y = __umul24(blockIdx.y, blockDim.y) + threadIdx.y;

if (x >= width || y >= height)


*((uchar4 *)((char )gpuOut + ygpuoutStride) + x) = tex2D(texRef, x, y);



You can post the input image are using and I’ll try to make a working example