Texture Linear Filter doesn't work with uchar! normalizedFloat does not work with uchar, why

KaiK · December 14, 2009, 5:08pm

Hi!

I’m trying to undistort and filter an image, loading it as a texture, but I can’t find the way of filtering it using only 1byte words.

If I load the texture as float values, there’s no problem, it works perfectly.

(texture<float, 2, cudaReadModeElementType>tex, etc)

The problem is when I try to load the texture as unsigned char values, as I explain in the code below.

I think that all I’ve got to do is define texture as cudaReadModeNormalizedFloat, and enable filter mode, but the result I get is a empty, completely black image.

I’ve simplyfied the code to work with a BW image, and I’ve wrote all that is related with texture in blue color, in order to do it easier to read.

Can someone help me?

I’m working on a Quadro FX 3700, and Fedora 9.

Thanks!!

#include <cuda.h>

#include <libs/camera.h>

constant int PITCH;

device float* img_global;

texture <unsigned char, 2, cudaReadModeNormalizedFloat> tex;

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(8, 0, 0, 0, cudaChannelFormatKindUnsigned);

device float2 distort_my_point(int px, int py, Tcamera cam )

{

//Adding distortion

float a1, a2, a3, delta_x, delta_y, dx,dy;

float r2 = (px-cam.K20)(px-cam.K20)/(cam.K00cam.K00) + (py-cam.K21)(py-cam.K21)/(cam.K11cam.K11);

float r4 = r2 * r2;

a1 = 2. * (px-cam.K20) * (py-cam.K21)/ (cam.K00*cam.K11);

a2 = r2 + 2. * (px-cam.K20) * (px-cam.K20)/ (cam.K00*cam.K00);

a3 = r2 + 2. * (py-cam.K21) * (py-cam.K21)/ (cam.K11*cam.K11);

delta_x = cam.dis2 * a1 + cam.dis3 * a2;

delta_y = cam.dis2 * a3 + cam.dis3 * a1;

delta_x = delta_x * cam.K00;

delta_y = delta_y * cam.K11;

dx = px + (px-cam.K20)(cam.dis0r2 + cam.dis1*r4) + delta_x;

dy = py + (py-cam.K21)*(cam.dis0*r2 + cam.dis1*r4) + delta_y;

return make_float2(dx, dy);

}

global void undistort_image_kernel(float *out, int2 size, Tcamera cam)

{

int2 thread = make_int2(blockDim.x * blockIdx.x + threadIdx.x, blockDim.y * blockIdx.y + threadIdx.y);

if(thread.x < size.x && thread.y < size.y)

{

	int index = thread.y * PITCH + thread.x;

	

	float2 distorted_point = distort_my_point(thread.x, thread.y, cam);

	

            out[index] = tex2D(tex, distorted_point.x, distorted_point.y);

	

}

}

host void undistort_image(bool mean, int im_size, int2 size, dim3 dimBlock, dim3 dimGrid, Tcamera cam, unsigned char *image)

{

tex.addressMode[0] = cudaAddressModeClamp;

tex.addressMode[1] = cudaAddressModeClamp;

tex.filterMode = cudaFilterModeLinear;

tex.normalized = false;

tex.channelDesc= cudaCreateChannelDesc(32,0,0,0,cudaChannelFormatKindFloat); //I've seen that piece of code in the nvidia forums, but it does not work.

cudaArray* array;

cudaMallocArray(&array, &channelDesc, size.x, size.y );

cudaMemcpyToArray(array, 0, 0, image, im_size*sizeof(unsigned char), cudaMemcpyHostToDevice);

[b] //until that point, everything is ok, i can put the cudaArray into a float array and save the images,

//and they are correctly undistorted.[/b]

cudaBindTextureToArray(tex, array, channelDesc)

;

undistort_image_kernel <<<dimGrid, dimBlock>>> (img_global, size, cam);

checkCUDAError("undistort img kernel");

cudaThreadSynchronize();

cudaUnbindTexture(tex);

cudaFreeArray(array);





////In order to get the result images

size_t pitch;

cudaMemcpyFromSymbol(&pitch, PITCH, sizeof(int), 0, cudaMemcpyDeviceToHost);

pitch = (int) pitch * sizeof(float);

float * host = (float*)malloc(im_size*sizeof(float));

cudaMemcpy2D(host, size.x*sizeof(float), img_global, pitch, size.x*sizeof(float), size.y, cudaMemcpyDeviceToHost);

checkCUDAError("memcpy2D");

for(int i=0; i<im_size; i++)

{

	img_in[i] = (unsigned char)host[i];

}

free(host);

}

extern “C” void memory_allocation_p(int im_width, int im_height, Tcamera cam)

{

int2 size = make_int2(im_width, im_height);

//block and grid dimensions

dim3 dimBlock(THREADS_X, THREADS_Y);

int blocks_x = im_width / THREADS_X;

int blocks_y = im_height / THREADS_Y;

if(im_width%THREADS_X !=0) blocks_x++;

if(im_height%THREADS_Y !=0) blocks_y++;

dim3 dimGrid(blocks_x, blocks_y);

//memory allocation for global vars

size_t pitch_dependant;

cudaMallocPitch((void**)&img_global, &pitch_dependant, im_width*sizeof(float), im_height);



int img_pitch = pitch_dependant / sizeof(float);

cudaMemcpyToSymbol(PITCH, &img_pitch, sizeof(int), 0, cudaMemcpyHostToDevice);

}

extern “C” void call_undistort(unsigned char *image ,int im_width, int im_height, Tcamera cam)

{

//basic declarations

int2 size = make_int2(im_width, im_height);

unsigned int im_size = im_width * im_height;

	

//block and grid dimensions

dim3 dimBlock(THREADS_X, THREADS_Y);

int blocks_x = im_width / THREADS_X;

int blocks_y = im_height / THREADS_Y;

if(im_width%THREADS_X !=0) blocks_x++;

if(im_height%THREADS_Y !=0) blocks_y++;

dim3 dimGrid(blocks_x, blocks_y);

undistort_image(im_width*im_height, size, dimBlock, dimGrid, cam, image);

}

extern “C” void memFree()

{

cudaFree(img_global);

}

KaiK · December 15, 2009, 12:54pm

Hi!

Nobody has the answer?

If the method is ok, should it be a HW issue on Quadro FX 3700?

I’ve to correct smt I wrote yesterday

Where I wrote

Its obviously wrong. At that point I can recover the image that is in the cudaArray without problems, but it’s still undistorted because the code has not executed de undistort kernel.

Any idea?

Thanks!

Hi!

I’m trying to undistort and filter an image, loading it as a texture, but I can’t find the way of filtering it using only 1byte words.

If I load the texture as float values, there’s no problem, it works perfectly.

(texture<float, 2, cudaReadModeElementType>tex, etc)

The problem is when I try to load the texture as unsigned char values, as I explain in the code below.

I think that all I’ve got to do is define texture as cudaReadModeNormalizedFloat, and enable filter mode, but the result I get is a empty, completely black image.

I’ve simplyfied the code to work with a BW image, and I’ve wrote all that is related with texture in blue color, in order to do it easier to read.

Can someone help me?

I’m working on a Quadro FX 3700, and Fedora 9.

Thanks!!

include <cuda.h>

include <libs/camera.h>

constant int PITCH;

device float* img_global;

texture <unsigned char, 2, cudaReadModeNormalizedFloat> tex;

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(8, 0, 0, 0, cudaChannelFormatKindUnsigned);

device float2 distort_my_point(int px, int py, Tcamera cam )

{

//Adding distortion
float a1, a2, a3, delta_x, delta_y, dx,dy;
float r2 = (px-cam.K20)(px-cam.K20)/(cam.K00cam.K00) + (py-cam.K21)(py-cam.K21)/(cam.K11cam.K11);
float r4 = r2 * r2;
a1 = 2. * (px-cam.K20) * (py-cam.K21)/ (cam.K00*cam.K11);
a2 = r2 + 2. * (px-cam.K20) * (px-cam.K20)/ (cam.K00*cam.K00);

a3 = r2 + 2. * (py-cam.K21) * (py-cam.K21)/ (cam.K11*cam.K11);

delta_x = cam.dis2 * a1 + cam.dis3 * a2;

delta_y = cam.dis2 * a3 + cam.dis3 * a1;
delta_x = delta_x * cam.K00;
delta_y = delta_y * cam.K11;
dx = px + (px-cam.K20)(cam.dis0r2 + cam.dis1*r4) + delta_x;
dy = py + (py-cam.K21)*(cam.dis0*r2 + cam.dis1*r4) + delta_y;
return make_float2(dx, dy);

}

global void undistort_image_kernel(float *out, int2 size, Tcamera cam)

{
int2 thread = make_int2(blockDim.x * blockIdx.x + threadIdx.x, blockDim.y * blockIdx.y + threadIdx.y);

if(thread.x < size.x && thread.y < size.y)

{

	int index = thread.y * PITCH + thread.x;

	

	float2 distorted_point = distort_my_point(thread.x, thread.y, cam);

	

            out[index] = tex2D(tex, distorted_point.x, distorted_point.y);

	

}
}

host void undistort_image(bool mean, int im_size, int2 size, dim3 dimBlock, dim3 dimGrid, Tcamera cam, unsigned char *image)

{

tex.addressMode[0] = cudaAddressModeClamp;
tex.addressMode[1] = cudaAddressModeClamp;

tex.filterMode = cudaFilterModeLinear;

tex.normalized = false;

tex.channelDesc= cudaCreateChannelDesc(32,0,0,0,cudaChannelFormatKindFloat); //I've seen that piece of code in the nvidia forums, but it does not work.
cudaArray* array;
cudaMallocArray(&array, &channelDesc, size.x, size.y );

cudaMemcpyToArray(array, 0, 0, image, im_size*sizeof(unsigned char), cudaMemcpyHostToDevice);
[b] //until that point, everything is ok, i can put the cudaArray into a float array and save the images,
//and they are correctly undistorted.[/b]
cudaBindTextureToArray(tex, array, channelDesc)

;

undistort_image_kernel <<<dimGrid, dimBlock>>> (img_global, size, cam);
checkCUDAError("undistort img kernel");

cudaThreadSynchronize();
cudaUnbindTexture(tex);
cudaFreeArray(array);





////In order to get the result images
size_t pitch;
cudaMemcpyFromSymbol(&pitch, PITCH, sizeof(int), 0, cudaMemcpyDeviceToHost);

pitch = (int) pitch * sizeof(float);
float * host = (float*)malloc(im_size*sizeof(float));
cudaMemcpy2D(host, size.x*sizeof(float), img_global, pitch, size.x*sizeof(float), size.y, cudaMemcpyDeviceToHost);

checkCUDAError("memcpy2D");	    	
for(int i=0; i<im_size; i++)
{

	img_in[i] = (unsigned char)host[i];

}

free(host);
}

extern “C” void memory_allocation_p(int im_width, int im_height, Tcamera cam)

{
int2 size = make_int2(im_width, im_height);
//block and grid dimensions
dim3 dimBlock(THREADS_X, THREADS_Y);

int blocks_x = im_width / THREADS_X;

int blocks_y = im_height / THREADS_Y;

if(im_width%THREADS_X !=0) blocks_x++;

if(im_height%THREADS_Y !=0) blocks_y++;

dim3 dimGrid(blocks_x, blocks_y);

//memory allocation for global vars

size_t pitch_dependant;

cudaMallocPitch((void**)&img_global, &pitch_dependant, im_width*sizeof(float), im_height);



int img_pitch = pitch_dependant / sizeof(float);

cudaMemcpyToSymbol(PITCH, &img_pitch, sizeof(int), 0, cudaMemcpyHostToDevice);
}

extern “C” void call_undistort(unsigned char *image ,int im_width, int im_height, Tcamera cam)

{

//basic declarations
int2 size = make_int2(im_width, im_height);

unsigned int im_size = im_width * im_height;

	

//block and grid dimensions

dim3 dimBlock(THREADS_X, THREADS_Y);

int blocks_x = im_width / THREADS_X;

int blocks_y = im_height / THREADS_Y;

if(im_width%THREADS_X !=0) blocks_x++;

if(im_height%THREADS_Y !=0) blocks_y++;

dim3 dimGrid(blocks_x, blocks_y);
undistort_image(im_width*im_height, size, dimBlock, dimGrid, cam, image);

}

extern “C” void memFree()

{
cudaFree(img_global);
}

Cygnus_X1 · December 15, 2009, 2:53pm

I think what you are trying to do cannot be done.

Another source of the problem may be that

tex.channelDesc= cudaCreateChannelDesc(32,0,0,0,cudaChannelFormatKindFloat);

you specify a channel that takes floats (last parameter) but then you bind it to a char array. I believe, the way you do it, your device code has absolutely no information that underlying bit data should be interpreted as unsigned chars and not floats. I think what it does, is to take a bundle of 4 of your unsigned chars and treat it as a float.

mvgalen · December 15, 2009, 9:20pm

It should definitvely be possible when cudaChannelFormatKindFloat is used.

Another source of the problem may be that
tex.channelDesc= cudaCreateChannelDesc(32,0,0,0,cudaChannelFormatKindFloat);
you specify a channel that takes floats (last parameter) but then you bind it to a char array. I believe, the way you do it, your device code has absolutely no information that underlying bit data should be interpreted as unsigned chars and not floats. I think what it does, is to take a bundle of 4 of your unsigned chars and treat it as a float.

This indeed is the problem, you incorrectly specify your input data as 32bits, while it should be 8 bit (chars).

The following should work:

cudaCreateChannelDesc(8, 0, 0, 0, cudaChannelFormatKindFloat);

KaiK · December 15, 2009, 11:52pm

I agree with that:

But about that…

The problem is that initially I’ve done it as you say, and the result was the same (black image).

I supose that it should work, but no. After that, I tried with “cudaCreateChannelDesc(32, …)” because I found that piece of code in another nvidia forum post. But same result.

What I’m trying to do must be possible, as I’ts done in one of the SDK examples (textureBicubic or st similar).

What I can’t stand is why what I’ve wrote does not work. That’s why I’m suspecting about an issue in the Quadro FX 3700…

eelsen · December 16, 2009, 12:10am

host void undistort_image(bool mean, int im_size, int2 size, dim3 dimBlock, dim3 dimGrid, Tcamera cam, unsigned char *image)

{

tex.addressMode[0] = cudaAddressModeClamp;
tex.addressMode[1] = cudaAddressModeClamp;

tex.filterMode = cudaFilterModeLinear;

tex.normalized = false;

tex.channelDesc= cudaCreateChannelDesc(32,0,0,0,cudaChannelFormatKindFloat); //I've seen that piece of code in the nvidia forums, but it does not work.
cudaArray* array;
cudaMallocArray(&array, &channelDesc, size.x, size.y );

cudaMemcpyToArray(array, 0, 0, image, im_size*sizeof(unsigned char), cudaMemcpyHostToDevice);

shouldn’t you be setting tex.normalized to true?

KaiK · December 16, 2009, 11:31am

That’s in case I use normalized coordinates, but I think it’s also possible to work without normalize. At least that’s the way it’s done in the SDK example

Simon_Green · December 16, 2009, 12:56pm

This should work fine on a Quadro FX 3700, the texture hardware is all the same.

The only thing I can see that’s wrong in your code is this line:

The channel descriptor describes the type and number of bits per channel stored in the texture (8-bit unsigned char in your case, not 32-bit floats). There is actually no need to set the channelDesc on the texture reference, it is automatically set based on the type when you construct the texture (see “includes/cuda_texture_types.h” in the CUDA toolkit).

As you point out, the bicubicTexture SDK sample uses a single-channel 8-bit image. Presumably this runs okay on your machine?

To answer your other question - the programming guide is a bit ambiguous, linear filtering works with floating-point textures OR any other type when it is set to return floating-point data.

Topic		Replies	Views
2D Texture access How can I access pixels from 2D texture CUDA Programming and Performance	24	18466	June 5, 2010
Linear interpolation with textures using unsigned data CUDA Programming and Performance	6	1372	February 20, 2020
Using Textures CUDA Programming and Performance	10	21799	March 29, 2007
OpenCV Image loading in CUDA texture CUDA Programming and Performance	11	2298	October 12, 2021
Undefined tex1Dfetch in kernel CUDA Programming and Performance	7	2782	March 11, 2021
opencv imageData copy to cuda CUDA Programming and Performance	6	8593	May 22, 2012
Simplest texture 2D examples CUDA Programming and Performance	11	11166	March 26, 2019
Reading and Writing OpenGL Textures with Cuda CUDA Programming and Performance	24	24847	June 26, 2023
How create channel format for 2-D texture binding to 'half-float' image ? CUDA Programming and Performance	6	2348	January 23, 2018
Reading R8G8B8A8 texture using tex2D() causes strange result. CUDA Programming and Performance	27	2847	April 28, 2018

Texture Linear Filter doesn't work with uchar! normalizedFloat does not work with uchar, why

Related topics