2D Texture access How can I access pixels from 2D texture

mpwm · May 27, 2010, 4:58pm

At least according to the documentation textures work only with 1,2,4 vector types, never tried it with 3, but it’s problematic with alignment which is why it’s probably not supported. You could try though as I think that OpenGL does support it, but I may be wrong. It’s generally better to work with alpha channels, even if unused so pixels are aligned to word boundaries. This means that you should work with uchar and manage the 3 ellements yourself.

Second question, How do you load/display the image? That is, what is the expected input/output format, looks from the images like it may be rrrrr ggggg bbbbbb and not rgbrgbrgb (or more specifically since you are getting blue images, it may be bbbbbb gggggg rrrrrr).

I attached some code, note that it is BAD, as it doesn’t coalesce writes since you write in bytes and three bytes per kernel. You need to either launch three times the kernels so that eacn kenrel handles one byte in each pixel or that each kernel handles 4 pixels (12 bytes)

Next, your cudaCreateChannelDesc is wrong. It’s easiest to use using templates. For uchar images you want

[codebox]texture<uchar, 2, cudaReadModeElementType> texRef;

{
uchar *in;

size_t stride;
cudaMallocPitch(&in, &stride, …);
cudaMemcpy2D(...);
cudaChannelFormatDesc desc = cudaCreateChannelDesc ();
// Assuming RGB image and width in pixels, we need width*3 as this is width*rgb

cudaBindTexture2D(NULL, &texRef, in, &desc, width*3, height, stride);
// Each kernel copies three pixels
dim3 dimGrid((height - 1)/16 + 1, (width - 1)/16 + 1);

dim3 dimBlock(16, 16);

KernelTexture <<< dimGrid, dimBlock >>> (out, outStride, height, width);
[/codebox]

For an rgb image you probably want

[codebox]global_ void KernelTexture(char *out, size_t outSride, int width, int height)

{

usigned int x = (umul24(blockDim.x, blockIdx.x) + threadIdx.x)*3;

usigned int y = umul24(blockDim.y, blockIdx.y) + threadIdx.y;

// width is in pixels, not bytes

if (x > width*3 || y > height)
return;
out[y*outSride + x ] = tex2D(texRef, x , y);

out[y*outSride + x + 1] = tex2D(texRef, x + 1, y);

out[y*outSride + x + 2] = tex2D(texRef, x + 2, y);

}

[/codebox]

now ! finally I got it ! Thanks a lot laughingrice !!, only thing which I had to figure out was to set up cudaMemcpy2D and cudaMallocPitch (). I put everywhere where I had to allocate memory 3 (for RGB image ).

But one thing what I don’t understand are references to RGB channels in this example:

text2D( texRef, idx + 0, idy )

text2D( texRef, idx + 1, idy )

text2D( texRef, idx + 2, idy )

does it mean that, I’m referring to RGB channels in pitch memory, which was allocated on device, but I’m accessing it through texture memory ? Is it right ? Is it mapping pitch → texture ?

Thanks in advance !

mpwm · May 27, 2010, 5:23pm

ohhhh no ! … I’ve just realized that for another image it doesn’t work External Image properly. For lena.jpg ( 512x512 ) works but for example for 300x300 doesn’t (see attch.) - I getting some strange black lines across the picture - what is wrong ?

laughingrice · May 27, 2010, 7:40pm

You are not handling the pitch correctly. From the diagonal black lines you are passing image width in bytes to the texture instead of image pitch in bytes

How do you load the image on the CPU side (if it’s OpenCV then you have image pitch on that side as well)

The idea is like this:

The image is width (RGB) elements by height lines. The buffer is pitch bytes width by height lines. So if I’ll manage to do it in text

rgbrgbrgbrgb00

width would be 4, height 3, pitch 14

The pitch idea is that you want the start of each line to be aligned to get read coalescing on each line

Look at the syntax of the functions

cudaMallocPitch(&buf, &bufPitch, width (in bytes), height)

cudaMemcpy2D(buf, bufPitch, host, hostPitch, width (in bytes), height, copy type)

where if the host doesn’t have a pitch you use hostPitch = line width in bytes. If you use OpenCV hostPitch = img->widthStep and not img->width*3

cudaBindTexture2D(NULL, &texRef, buf, &desc, width*3, height, bufPitch);

notice the use of both width*3 and bufPitch

mpwm · June 5, 2010, 3:00pm

You are not handling the pitch correctly. From the diagonal black lines you are passing image width in bytes to the texture instead of image pitch in bytes

How do you load the image on the CPU side (if it’s OpenCV then you have image pitch on that side as well)

The idea is like this:

The image is width (RGB) elements by height lines. The buffer is pitch bytes width by height lines. So if I’ll manage to do it in text

rgbrgbrgbrgb00

rgbrgbrgbrgb00

rgbrgbrgbrgb00

width would be 4, height 3, pitch 14

The pitch idea is that you want the start of each line to be aligned to get read coalescing on each line

Look at the syntax of the functions

cudaMallocPitch(&buf, &bufPitch, width (in bytes), height)

cudaMemcpy2D(buf, bufPitch, host, hostPitch, width (in bytes), height, copy type)

where if the host doesn’t have a pitch you use hostPitch = line width in bytes. If you use OpenCV hostPitch = img->widthStep and not img->width*3

cudaBindTexture2D(NULL, &texRef, buf, &desc, width*3, height, bufPitch);

notice the use of both width*3 and bufPitch

hi laughingrice,

sorry I did’t answer so long (I’ve been dealing with global memory … ) so, I very thankful for your answers, now I got it, at least , what pitch referred. You never know I did you help me! But Unfortunately I still get this strange stripes (!) … I tried everything what could have influence on it… I did after your advices: this is what I have:

// ------------------ load image ----------------  

   loadImage( &src_img, cv_img );

// ----------------------------------------------

uchar *d_buff;

   size_t pitch;

   const unsigned int d_pitch = imgW * sizeof(uchar) * CHANNELS;

   const unsigned int img_pitch = imgW * sizeof(uchar) * CHANNELS;

	  cutilSafeCall( cudaMallocPitch( &d_buff, &pitch, img_pitch, imgH ) );

	  cutilSafeCall( cudaMemcpy2D( d_buff, pitch, src_img, cv_img->widthStep,imgW*CHANNELS*sizeof(uchar), imgH, cudaMemcpyHostToDevice ) ); 

cutilSafeCall( cudaBindTexture2D( NULL, texRef, d_buff, channelDesc, imgW*CHANNELS*type_size , imgH, pitch ) );

		   

// allocate device momory for results 

	uchar *d_out;

	  cutilSafeCall( cudaMalloc( (void **)&d_out, mem_size ) );

	  cutilSafeCall( cudaMemset( (void *)d_out, 0, mem_size ) ); // fill with 0s

	  copy_image_kernel( d_out, pitch, imgW, imgH ); 

// allocate host memory for results 

   uchar *h_out;

   cutilSafeCall( cudaMalloc( (void **)&h_out, mem_size ) );

// get results from the device 

	  cutilSafeCall( cudaMemcpy2D( h_out, img_pitch, d_out, d_pitch, imgW * sizeof(uchar) * CHANNELS , imgH, cudaMemcpyDeviceToHost) );

Could you help me again … I’ll be very thankful …Thanks in advance

laughingrice · June 5, 2010, 4:15pm

Afraid that I’m a bit short on time this week, so hope that this is good enough, if not, I’ll try something more complete in a week

In any case, I’m guessing that the input image does not have an alpha channel if it is the input of OpenCV

jpeg never has, png can have an alpha channel

I didn’t test the 4 channel output image, don’t know if OpenCV supports it

I’m also writing this ad-hoc so I hope that there are not too many compiler errors

[codebox]

texture<unsigned char, 2, cudaReadModeElementType> texRef;

global void CopyKernel (uchar4 *gpuOut, size_t gpuoutStride, int width, int height, int nChannels)

{

    int x = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;

    int y = __umul24(blockIdx.y, blockDim.y) + threadIdx.y;

if (x >= width || y >= height)

            return;

int ix = x*nChannels;

uchar4 val = make_uchar4(0,0,0,0);

val.x = tex2D(texRef, ix, y);

if (nChannels >= 2) val.y = tex2D(texRef, ix + 1, y);

if (nChannels >= 3) val.z = tex2D(texRef, ix + 2, y);

if (nChannels >= 4) val.w = tex2D(texRef, ix + 3, y);

*((uchar4 *)((char )gpuOut + ygpuoutStride) + x) = val

}

IplImage* img = cvLoadImage(“img.jpg”, CV_LOAD_IMAGE_ANYCOLOR);

if (!img || img->depth != 8)

return;

// Create an rgba output image

IplImage* oimg = cvCreateImage(cvSize(img->width, img->height), img->depth, 4);

unsigned char *gpuIn;

size_t gpuinStride;

cudaMallocPitch((void **)&gpuIn, &gpuinStride, img->width*img->nChannels, img->height);

// 4 channel output

unsigned uchar4 *gpuOut;

size_t gpuoutStride;

cudaMallocPitch((void **)&gpuOut, &gpuoutStride, oimg->width*4, oimg->height);

cudaMemcpy2D(gpuIn, gpuinStride, img->imageData, img->widthStep, img->width*img->nChannels, img->height, cudaMemcpyHostToDevice);

cudaChannelFormatDesc desc = cudaCreateChannelDesc();

cudaBindTexture2D (NULL, &texRef, gpuIn, &desc, img->width*img->nChannels, img->height, gpuinStride);

dim3 dimBlock(16, 16);

dim3 dimGrid((img->width + dimBlock.x - 1)/dimBlock.x, (img->height + dimBlock.y - 1)/dimBlock.y);

CopyKernel <<< dimGrid, dimBlock >>> (gpuOut, gpuoutStride, img->width, img->height, img->nChannels);

cudaMemcpy2D(oimg->imageData, oimg->widthStep, gpuOut, gpuoutStride, oimg->width*oimg->nChannels, oimg->height, cudaMemcpyHostToDevice);

[/codebox]

if the input is RGBA, i.e. img->nChannels == 4, you can simplify this by using a uchar4 texture, i.e

[codebox]

texture<uchar4, 2, cudaReadModeElementType> texRef;

global void CopyKernel (uchar4 *gpuOut, size_t gpuoutStride, int width, int height, int nChannels)

{

    int x = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;

    int y = __umul24(blockIdx.y, blockDim.y) + threadIdx.y;

if (x >= width || y >= height)

            return;

*((uchar4 *)((char )gpuOut + ygpuoutStride) + x) = tex2D(texRef, x, y);

}

[/codebox]

You can post the input image are using and I’ll try to make a working example