How to go over an image more than once?

aviad · December 12, 2007, 6:59pm

Hi,

I’m trying to write a small program that goes over an image, and manipulates it by using iterations, means - each manipulation is dependent on the previous one.

As a start, I’m just trying to keep the original image after a few iterations, without manipulating it, just to make sure I can handle the image correctly.

I’m trying to base my code on the porvided simpleTexture, but I got a weird problem that I cannot manage to solve: the image is getting blurred and shifted. The more iterations I run, the worst it gets.

This is my simple kernel:

__global__ void

transformKernel( float* g_odata, int width, int height) 

{

    // calculate normalized texture coordinates

    unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

    unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

   float u = x / (float) width;

    float v = y / (float) height;

     // read from texture and write to global memory

    g_odata[y*width + x] = (tex2D(tex,u,v));

}

And this is the code the calls the kernel:

void

runTest( int argc, char** argv) 

{

	CUT_DEVICE_INIT();

	// load image from disk

	float* h_data = NULL;

	unsigned int width, height;

	char* image_path = cutFindFilePath(image_filename, argv[0]);

	if (image_path == 0)

  exit(EXIT_FAILURE);

	CUT_SAFE_CALL( cutLoadPGMf(image_path, &h_data, &width, &height));

	unsigned int size = width * height * sizeof(float);

	printf("Loaded '%s', %d x %d pixels\n", image_filename, width, height);

	// allocate device memory for result

	float* d_data = NULL;

	CUDA_SAFE_CALL( cudaMalloc( (void**) &d_data, size));

	// allocate array and copy image data

	cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);

	cudaArray* cu_array;

	CUDA_SAFE_CALL( cudaMallocArray( &cu_array, &channelDesc, width, height )); 

	CUDA_SAFE_CALL( cudaMemcpyToArray( cu_array, 0, 0, h_data, size, cudaMemcpyHostToDevice));

	// set texture parameters

	tex.addressMode[0] = cudaAddressModeWrap;

	tex.addressMode[1] = cudaAddressModeWrap;

	tex.filterMode = cudaFilterModeLinear;

	tex.normalized = true;    // access with normalized texture coordinates

	// Bind the array to the texture

	CUDA_SAFE_CALL( cudaBindTextureToArray( tex, cu_array, channelDesc));

	dim3 dimBlock(8, 8, 1);

	dim3 dimGrid(width / dimBlock.x, height / dimBlock.y, 1);

	for(int i=0;i<100;i++){

  CUDA_SAFE_CALL( cudaThreadSynchronize() );

  unsigned int timer = 0;

  CUT_SAFE_CALL( cutCreateTimer( &timer));

  CUT_SAFE_CALL( cutStartTimer( timer));

 // execute the kernel

  CUDA_SAFE_CALL( cudaThreadSynchronize() );

  transformKernel<<< dimGrid, dimBlock, 0 >>>( d_data, width, height);

  CUDA_SAFE_CALL( cudaThreadSynchronize() );

  CUDA_SAFE_CALL( cudaMallocArray( &cu_array, &channelDesc, width, height )); 

  CUDA_SAFE_CALL(cudaMemcpyToArray(cu_array, 0, 0, d_data, size, cudaMemcpyDeviceToDevice));

  CUDA_SAFE_CALL( cudaThreadSynchronize() );

  CUDA_SAFE_CALL( cudaBindTextureToArray( tex, cu_array, channelDesc));

 // check if kernel execution generated an error

  CUT_CHECK_ERROR("Kernel execution failed");

 CUDA_SAFE_CALL( cudaThreadSynchronize() );

  CUT_SAFE_CALL( cutStopTimer( timer));

  printf("Processing time: %f (ms)\n", cutGetTimerValue( timer));

  printf("%.2f Mpixels/sec\n", (width*height / (cutGetTimerValue( timer) / 1000.0f)) / 1e6);

  CUT_SAFE_CALL( cutDeleteTimer( timer));

	}

	// allocate mem for the result on host side

	float* h_odata = (float*) malloc( size);

	// copy result from device to host

	CUDA_SAFE_CALL( cudaMemcpy( h_odata, d_data, size, cudaMemcpyDeviceToHost) );

	// write result to file

	char output_filename[1024];

	strcpy(output_filename, image_path);

	strcpy(output_filename + strlen(image_path) - 4, "_out.pgm");

	CUT_SAFE_CALL( cutSavePGMf( output_filename, h_odata, width, height));

	printf("Wrote '%s'\n", output_filename);

	// write regression file if necessary

	if( cutCheckCmdLineFlag( argc, (const char**) argv, "regression")) 

	{

  // write file for regression test

  CUT_SAFE_CALL( cutWriteFilef( "./data/regression.dat", h_odata, width*height, 0.0));

	}

	// cleanup memory

	CUDA_SAFE_CALL(cudaFree(d_data));

	CUDA_SAFE_CALL(cudaFreeArray(cu_array));

	free(h_data);

	free(h_odata);

	cutFree(image_path);

}

The code is based on the simpleTexture project, I only added the loop and the update of the texture on the device so changes of each iteration will have an affect.

I attached the blurred pgm file I get as a result of the above code.

If anyone has any clue on why do I get such weird results, or how to run over an image which is stored in a float texutre, I’ll be grateful, since I havn’t got any lead…

:)

hmeck · December 12, 2007, 8:43pm

I had this problem. It turns out that when your filtering mode is linear, you need to use

float u = x / (float) width + 0.5f;
float v = y / (float) height + 0.5f;

instead of

float u = x / (float) width;
float v = y / (float) height;

or change the filtering mode to cudaFilterModePoint.

Check out Appendix F in the programming guide.

aviad · December 12, 2007, 11:55pm

Thanks a lot for your help!

Changing the filtering mode to cudaFilterModePoint did seem to solve the problem.

The weird part is that adding 0.5f to u and v while in linear mode didn’t change anything. Any idea why?

BTW - I didn’t understand from the guide, in which cases should I prefer using linear mode filter rather than cudaFilterModePoint?

Thanks again…

pkeir · December 13, 2007, 10:10am

cudaFilterModePoint doesn’t filter. If you specify a u,v of (1.1,34.9) the floats will be rounded to integers, and you get the actual pixel value.

cudaFilterModeLinear with say, (x+0.5f, y+0.5f), gives you the average of the four nearest pixels.

hmeck · December 13, 2007, 6:15pm

If I understand correctly (looking at F.2 in the programming guide), when you use linear filtering, when your address (using un-normalized addressing) is an integer x, you are actually getting the interpolation of x-1 and x, i.e. you are actually looking at x - 0.5. In order to get x actual you need to add 0.5. Probably when you are using normalized textures, you have to add in the half before you divide. I don’t use normalized texture addressing, so I don’t know if this holds true in your case. So I should have said to try:

float u = (x + 0.5f) / (float) width;

It is true that filterModePoint is nearest neighbour, so is not a filter.

It seems that (x, y) gives you the average of four pixels, x-0.5, x+0.5, y-0.5, y+0.5.

Topic		Replies	Views
opencv imageData copy to cuda CUDA Programming and Performance	6	8591	May 22, 2012
cudaFilterModeLinear my fault or bug? CUDA Programming and Performance	4	3362	September 1, 2007
Texture Linear Filter doesn't work with uchar! normalizedFloat does not work with uchar, why CUDA Programming and Performance	7	3000	December 16, 2009
Using Textures CUDA Programming and Performance	10	21777	March 29, 2007
Faster method of shifting texture data CUDA Programming and Performance cuda	11	450	July 31, 2023
2D Texture access How can I access pixels from 2D texture CUDA Programming and Performance	24	18464	June 5, 2010
2D Convolution problem following example from SDK source code included CUDA Programming and Performance	9	11637	June 7, 2011
Kernel for pixel format conversions CUDA Programming and Performance cuda	6	772	October 16, 2020
Image Processing change image size Does anybody know a quick and easy way to change the size of an i CUDA Programming and Performance	5	8968	January 31, 2010
Using tex2D for unsigned short/char CUDA Programming and Performance	14	3621	November 15, 2017

How to go over an image more than once?

Related topics