opencv imageData copy to cuda

i am try copy imageData (a OpenCV IPLimage which is formatted to unsigned 8bit characters) to cuda array ,and bind to texture memory .Then handle it in the kernel .but the result was wrong :( i have deal with it the whole day,kernel is simple code

__global__ void undistort(float* dstdata, int width, int height)



	unsigned int u = threadIdx.x+blockIdx.x*blockDim.x;

	unsigned int v = threadIdx.y+blockIdx.y*blockDim.y;

	  int u_coor,v_coor;

	  float x,y;

	  float xtmp,ytmp;

	  float r;


		x = (u-u0)*pixel;

		y = (v-v0)*pixel;

		r = sqrt(pow(x-X0,2)+pow(y-Y0,2));


		xtmp = x-X0









		ytmp = y-Y0






		u_coor = xtmp/pixel+u0;

		v_coor = ytmp/pixel+v0;


		//get texture data  

		dstdata[v * width + u] = tex2D(texsrc, u_coor, v_coor);


the host code is


__constant__ float dis[DISTORTION_NUM];

__constant__ float pixel = 0.006400;

__constant__ float X0 = -0.024100;

__constant__ float Y0 = 0.058500;

__constant__ float f = 24.405500;

__constant__ int u0 = 5616/2;

__constant__ int v0 = 3744/2;

texture<uchar, 2, cudaReadModeElementType> texsrc;

int main (int argc, char** argv)


	IplImage *src_img,*dst_img;

	if((src_img = cvLoadImage(argv[1],-1)) != 0)


	int src_w = src_img->width;

	int src_h = src_img->height;

	int src_size = src_h*src_w; 

	dst_img = cvCloneImage(src_img);

	dst_img->origin = src_img->origin;


	int dst_h = dst_img->height;

	int dst_w = dst_img->width;

	int dst_size = dst_h*dst_w;

	float *tmp_dis = (float* )malloc(sizeof(float)*DISTORTION_NUM);

	tmp_dis[0] = -1.724000e-004; //k1

	tmp_dis[1] = 2.642000e-007;  //k2

	tmp_dis[2] = -3.814000e-006; //p1

	tmp_dis[3] = 7.154000e-006;  //p2

	cutilSafeCall(cudaMemcpyToSymbol(dis, tmp_dis, sizeof(float)*DISTORTION_NUM));



	cudaEvent_t start, stop;




	cudaChannelFormatDesc channelDes = cudaCreateChannelDesc<float>();

	cudaArray* srcArray;

	cudaMallocArray(&srcArray, &channelDes, src_w, src_h);


	cudaMemcpy2DToArray(srcArray, 0, 0, src_img->imageData, sizeof(unsigned char) * src_img->widthStep, 

						sizeof(unsigned char) * src_img->width, src_img->height, cudaMemcpyHostToDevice);


	texsrc.addressMode[0] = cudaAddressModeWrap;

	texsrc.addressMode[1] = cudaAddressModeWrap;


	texsrc.filterMode = cudaFilterModeLinear;

	texsrc.normalized = 1;


	cudaBindTextureToArray(texsrc, srcArray, channelDes);


	float* dst_data;

	cudaMalloc((void**)&dst_data, dst_size*sizeof(float));

	dim3 dimBlock(16, 16, 1);

	dim3 dimGrid((src_w+dimBlock.x-1)/dimBlock.x, (src_h+dimBlock.y-1)/dimBlock.y);


	undistort<<<dimGrid, dimBlock, 0>>>(dst_data, src_w, src_h);

	cutilCheckMsg("undistort failed ");





	float elapsedtime;


	printf("Time to undistort:	%3.1f ms\n",elapsedtime);















	return 0 ;


I want the filter mode be Linear,but if cudaCreateChannelDesc() ,i get the error message

,but the imageData is unsigned 8bit characters (uchar *),i want know what is wrong with my code ,why

undistort-cuda.exe input,jpg result.jpg

the reslut is strange .need your help ,here is the code (4.03 KB)

,best regard


have you tried without the texture?

yes,and failed again:(.the texture memory more simple , i think .but i dont know what’s wrong .do you think the imageData copy to array and the result copy to host not working correctly ?

Just a random thought. Interpolation works well when you give floating coordinates…

But you are giving integer coordinates.

I hope you are looking for hardware interpolation, aren’t you?

Be aware that hardware interpolation is done with less precision and will deviate from similar CPU code results

i think you are right,i have changed

float u_coor,v_coor

,do you mean i should do the Linear by myself ,should not looking for hardware interpolation

texsrc.filterMode = cudaFilterModeLinear;

.but i try the

texsrc.filterMode = cudaFilterModePoint

still wrong (you can use any jpg pic try ,but the result seems like the same ).I think the serious problem is the imageData handle in the wrong way ,here is the another topic same like my problem .

  1. if memory serves your channel descriptor should be unsigned char and not float, as the channel descriptor describes the input, not the output.
  2. Are you sure that your input image is gray scale (i.e that it has only one channel)?
  3. As for filtering, CUDA always converts your indexes to float (can cause performance issues if you are not aware of that due to extra type conversion)
  4. Also the interpolation is based on the center or the pixel rather then the edge, i.e you will get your correct pixel value for pixel (0,0) at texture location (0.5,0.5), not (0,0) as most people expect. Giving integral indexes gives the correct value for nearest neighbor interpolation, not linear interpolation
  5. For linear interpolation you need float output, so you should set the last texture template parameter to cudaReadModeNormalizedFloat rather than cudaReadModeElementType (note that your output will be in the range of [0,1] rather than [0,255]
  6. You are using normalized coordinates for the texture access, which means that texture coordinates are in the range of [0,1) X [0,1) rather than [0,n) x [0,m) (i.e you need to index the texture with floating point values in the range between 0 and 1, and not 0 to image width / height)

thank you for your reply,you are the master :thumbup: , the above code lots of basic errors,i can work fine with gray scale i can linear interpolation with the cudaFilterModeLinear or myself codes .now how can i work with 3 or 4channels image ,do i need to uchar4 instead ? i cannot handle it well about how to tex2D() r.g.b.a value .in the forum there a

why i always think tex2D(imgTex, x-1, y)is the nearest pixel value ,am i wrong ? :blink: