resize with char texture Image returns skewed

I’m trying to copy the data from a OpenCV IPLimage which is formatted to unsigned 8bit characters. the imagedata is a 1D array of characters that i pass into a resize function in cuda.

I have a setup where i convert directly into a 1D float array and copy that float array into the gpu and this works perfectly. The problem is I’m working with extremely large pictures that will cause the CPU to run out of memory if i try to expand the character array into a float array before copying to the gpu. But if i copy the character array directly into the gpu, the resulting image is skewed with it’s right edge running diagnal through the image. I figure i’m missing something so small because i’m so close to it working correctly and getting rid of this size limitation i have.

The resize kernel is pretty much the sample image resize code :P The actual code is shown below, where the rows and columns of the original large image are passed and the float array on the gpu that i want to store my resulting small image is g_odata.

The resulting skewed images look like this… what the hell is wrong? beats me…


texture<char, 2, cudaReadModeNormalizedFloat> tex;

__global__ void

resizeKernel( float* g_odata, int cols, int rows) 


	//uses cuda's internal interpolation, change texture's tex.normalized to true for use.

    unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

    unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

    float u = (float)x / (float) cols;

    float v = (float)y / (float) rows;  

    g_odata[y*cols + x] = tex2D(tex, u, v);


The texture Desc is setup as either one of these ways with no changes in the output.

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(8, 0, 0, 0, cudaChannelFormatKindFloat);


cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<unsigned char>();

In the .cu file i set the texture up as such. Changeing the address mode between clamp and wrap does not change the resized output image. Neither does the filtermode. Normalized needs to be true because of the kernel setup.

   tex.addressMode[0] = cudaAddressModeWrap;

    tex.addressMode[1] = cudaAddressModeWrap;

    tex.filterMode = cudaFilterModeLinear;

    tex.normalized = 1;


Hostimage is the 1D char array and i copy it to the GPU with this code. width and height are the width and height values returned from the opencv wrapper.

size = height*width*sizeof(char);

CUDA_SAFE_CALL( cudaMallocArray( &cu_array, &tex.channelDesc, width, height )); 

CUDA_SAFE_CALL( cudaMemcpyToArray( cu_array, 0, 0, hostimage, size, cudaMemcpyHostToDevice));

CUDA_SAFE_CALL( cudaBindTextureToArray( tex, cu_array, channelDesc));

dim3 dimBlock(16, 6, 1);

dim3 dimGrid(iDivUp(r_width,dimBlock.x),iDivUp(r_height,dimBlock.y), 1);

resizeKernel<<< dimGrid, dimBlock, 0 >>>( d_data, r_width, r_height);

CUDA_SAFE_CALL(cudaMemcpy(r_image,d_data, r_size, cudaMemcpyDeviceToHost) );

and that’s about it, but i have no idea what explains the skewing and i’m sure it’s super simple. If anyone could help me with this it’d be really appreciated since i can’t find much about the opencv char to gpu texture and letting it convert nicely beside you should do it :| .
white_out.bmp (85.1 KB)

ugh, so i after looking at the image knowing i was getting an extra pixel every row, i go back through the openCV IPL and there’s a widthstep value that is equivalent of cuda’s pitch. anyone know offhand if a 2dmemcopy through the 1D char array would work nicely since i’ll have a pitch and width and height of the array?

I think your u and v in the texture reference must be accessing locations outside of your image size. Try changing to:

tex.addressMode[0] = cudaAddressModeClamp;

   tex.addressMode[1] = cudaAddressModeClamp;

If you have a weird waterfall effect then you are definitely accessing outside the image. I would then check your block and grid size creation to make sure your getting what you are expecting.

For anyone that might come here looking for some hints as I did…

I managed to get the unsigned char array from an OpenCV IPLimage working with a 2D texture. As Lord Binky suggested, OpenCV pads the image array so it is divisible by 4. The pitch is stored in the widthStep field of the IPLimage struct.

Below are pieces of the code I used:

Read input image:

unsigned char *testImageData;

	IplImage *image;														// This is image pointer

	image = cvLoadImage(testImage.c_str(), CV_LOAD_IMAGE_GRAYSCALE);		// load the image

	if( image == 0 )


		printf( "ERROR: Unable to open test image \"%s\", aborting\n", testImage.c_str());



	testImageData = (unsigned char*) image->imageData;

Copy data to the device:

cudaArray *d_testImage;


		// Create channel descriptor for 2D CUDA Array

		cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<unsigned char>();

		// Allocate array on device

		cudaMallocArray(&d_testImage, &channelDesc, image->width, image->height);

		CHECK_ERROR( "cudaMalloc d_testImage failed" );

		cudaMemcpy2DToArray(d_testImage, 0, 0, testImageData, sizeof(unsigned char) * image->widthStep, sizeof(unsigned char) * image->width, image->height, cudaMemcpyHostToDevice);

		CHECK_ERROR( "cudaMemcpy2DToArray d_testImage failed" );

Texture reference in kernel:

texture<unsigned char, 2, cudaReadModeElementType> testImageTex;

You can then grab data from the texture using tex2D.

Now I just wonder what kind of performance unsigned char fetching will have compared to ints or floats…

thank you for your sharing ,but i did like the above the result was not correct,My topic,need your help ,thank you advance :)

with pixels, width and height you will be able to resize images c# in a way you want. and of course the image may not get skewed.