Texture reference parameter in kernel function

Hello I have little problem when trying to pass texture references as kernel function parameters. I don’t want to write code for ushort4 textures and for uchar4 textures. I would like to use templates instead but this brings me only troubles ;(

I tried several combinations of parameter types in kernels and in other device functions.

Here are examples of function headers with errors i got from compiler:

template<class T, int dim, enum cudaTextureReadMode readMode>

global void d_DiffRay(float *d_output, texture<T, dim, readMode> &tex, uint imageW, uint imageH, uint x, uint y)

template<class T, int dim, enum cudaTextureReadMode readMode>

inline device float4 GetTex3D(texture<T, dim, readMode> tex, float3 pos)

Performing Custom Build Step

volumeRender.cu

tmpxft_000003b0_00000000-3_volumeRender.cudafe1.gpu

tmpxft_000003b0_00000000-8_volumeRender.cudafe2.gpu

Signal: caught in Code_Expansion phase.

<input>(0): Error: Signal caught in phase Code_Expansion -- processing aborted

nvopencc ERROR: C:\programs\Develop\cuda\bin/../open64/lib//be.exe returned non-zero status 3

template<class T, int dim, enum cudaTextureReadMode readMode>

global void d_DiffRay(float *d_output, texture<T, dim, readMode> tex, uint imageW, uint imageH, uint x, uint y)

template<class T, int dim, enum cudaTextureReadMode readMode>

inline device float4 GetTex3D(texture<T, dim, readMode> tex, float3 pos)

volumeRender.cu

tmpxft_000007a8_00000000-3_volumeRender.cudafe1.gpu

tmpxft_000007a8_00000000-8_volumeRender.cudafe2.gpu

Internal error

nvcc error   : 'ptxas' died with status 0xC0000005 (ACCESS_VIOLATION)

template<class T, int dim, enum cudaTextureReadMode readMode>

global void d_DiffRay(float *d_output, texture<T, dim, readMode> &tex, uint imageW, uint imageH, uint x, uint y)

template<class T, int dim, enum cudaTextureReadMode readMode>

inline device float4 GetTex3D(texture<T, dim, readMode> &tex, float3 pos)

volumeRender.cu

tmpxft_00000e80_00000000-3_volumeRender.cudafe1.gpu

tmpxft_00000e80_00000000-8_volumeRender.cudafe2.gpu

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000e80_00000000-4_volumeRender.ptx, line 392; error   : Unknown symbol '__T21234'

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000e80_00000000-4_volumeRender.ptx, line 442; error   : Unknown symbol '__T21234'

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000e80_00000000-4_volumeRender.ptx, line 845; error   : Unknown symbol '__T21326'

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000e80_00000000-4_volumeRender.ptx, line 895; error   : Unknown symbol '__T21326'

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000e80_00000000-4_volumeRender.ptx, line 1652; error   : Unknown symbol '__T21401'

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000e80_00000000-4_volumeRender.ptx, line 1768; error   : Unknown symbol '__T21401'

....

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000b04_00000000-4_volumeRender.ptx, line 9138; error   : Unknown symbol '__T21834'

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000b04_00000000-4_volumeRender.ptx, line 9254; error   : Unknown symbol '__T21834'

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000b04_00000000-4_volumeRender.ptx, line 10235; error   : Unknown symbol '__T21883'

ptxas C:\DOCUME~1\Loki\LOCALS~1\Temp/tmpxft_00000b04_00000000-4_volumeRender.ptx, line 10351; error   : Unknown symbol '__T21883'

ptxas fatal   : Ptx assembly aborted due to errors

Can enybody tell me where is the problem or what am i doing wrong?

Thanks in advance :D

It looks like templates are broken in the latest beta SDK.
It likely a linker bug with the function name mangling, but that’s a pure guess.

http://forums.nvidia.com/index.php?showtopic=69649
http://forums.nvidia.com/index.php?showtopic=70559

Thanks Worley but i dont have lates beta 2 toolkit. I am using old 2.0 beta version …

hmmm

btw anybody have any suggestions how to pass different texture reference to kernel function other way than by using templates?

i mean in host code i have to call kernel which should use specifix texture but as i want to have only one kernel function i dont know how to do this without templates…

Textures are implicit static file scope variables. You cannot pass them as parameters to a kernel. The compiler unfortunately doesn’t produce any useful warning when you use a texture as an argument.

thanks for info ;)

but is there other way to do “dynamic” kernel which can compute with different textures???

Just a guess, but you can probably just bind the texture reference to a new array in your host code.

Yes, this is what I do. Load all the arrays at startup to the device, then call the bind function to select the one you need just before calling your kernel.

Thanks guys for wolderfull idea but still in my particular case it doesn’t help :(

the problem is that the texture reference can not be intialized dynamically but i need ushort4 texture one time or uchar4 second time so to do this, threre must exist two different texture references… and so here we are at the beginning :(

One idea is to have only ushort4 textures but thre will 2x more memory allocated which is not good so let’s find another solution ;)

You cannot make 2 different kernels for each texture?

Hehe - this is what i try to avoid and make olny one kernel so my code will not duplicited … but until we can solve the problem i must stuck with this solution :(

You cannot solve that by making a macro?

#define macro_kernel_body(datatype, texturename) \

.... \

#datatype var = tex1Dfetch(#texturename, index); \

...

And then

__global__ void float_kernel()

{

macro_kernel_body(float, float_tex);

}

__global__ void int_kernel()

{

macro_kernel_body(int, int_tex);

}

I don’t have my code here at home, so am not 100% sure about the macro, but it should be not far off.

Thanks for try but this is not what i try to achieve ;)

I would like to have olny one kernel and use it with two different textures references :D

btw i have texture references bounded to cudaArrays so your tex1Dfetch will not work :whistling:

but as i wrote before thanks for your afford to help :D

Hmm… This can almost be solved using templates with type parameters (which nvcc supports), but the texture reference living at file scope makes that a bit complicated. I’ve made polymorphic kernels that operated on different data types with templates, but they only had function scope variables.

Another option might be to use an integer template parameter with an if-statement to pick the correct texture reference inside the code. The dead code optimizer will eliminate the branch not taken since the condition is known at compile time.

Yes, that is what i wanted to do - use templates with texture reference as template parameter but with no success :D

Nice (!) Hmmmm :thumbup: I should try to do this way but will it really remove dead banches? :D

Yeah, it will. You can see it at work in e.g. the reduction example from the SDK. There you also have if statements that depend on the template parameter.

The dead code optimizer generally works very well. So well, in fact, that many post to the forums with questions about how commenting out the only global memory write in their kernel makes a huge difference in performance (the dead code optimizer removed everything that was needed to calculate the final value to be written).

The only cases where I have seen the dead code optimizer not remove dead code is when that code writes to shared memory. The optimizer is obviously assuming that the value written in one thread might possible be read in another thread.

Well, by using templates you actually have two different kernels, but of course one code only. If there’s no way to use textures as template parameters, you can at least use template parameter classes reading different textures, like in this example:

texture<uchar4,3,cudaReadModeNormalizedFloat>  tex_a;

texture<ushort4,3,cudaReadModeNormalizedFloat> tex_b;

#define DECL __device__

struct tex_a_reader {

    DECL float4 operator()(float3 p) const { return tex3D(tex_a, p.x, p.y, p.z); }

};

struct tex_b_reader {

    DECL float4 operator()(float3 p) const { return tex3D(tex_b, p.x, p.y, p.z); }

};

template <typename TextureReader>

__global__ void texture_test(float * result) {

    TextureReader tex_3d;

    float3 pos = make_float3(threadIdx.x, threadIdx.y, blockIdx.x);

    float4 value = tex_3d(pos);

    result[threadIdx.x] = value.x + value.y + value.z + value.w;

}

void test(bool use_uchar, float * deviceptr) {

    if (use_uchar)

        texture_test<tex_a_reader><<<g,b>>>(deviceptr);

    else

        texture_test<tex_b_reader><<<g,b>>>(deviceptr);

}

first, if that is what you want, templates do not help any way. Templates for this case behave just as macros: You have the source code only once, but there will be multiple functions generated in the binary. Admittedly templates are a bit less messy to use though.

Secondly, since you use different types you can not use the same code to process both.

You can use only one kernel though, using the macros someone else suggested above:

void __global__ kernel(int float_variant) {

  if (float_variant) {

    macro_kernel_body(float, float_tex);

  } else {

    macro_kernel_body(int, int_tex);

  }

}

It will double the size of the kernel, though that should not matter too much (though you can probably cut down the code duplication to the texture reads if you change the textures to convert the ints to floats, if that works for your case).

:thumbsup: Good work :excl: :D

I am aware that there will be two kernels but that doesn’t matter because one source code is the main priority ;)

Thanks again to Everybody for all hints and advices you’ve provided :wink: