Hi,
I’m trying to port code made in HLSL for DirectX 9 and DirectX 10 to Cuda, since it needs to run on a multiple servers with Tesla cards. Only targeting 2.0+.
I’d like to have the texture fetch in the kernel get a float4 normalized between 0 and 1. So if I understand, I need to declare the texture with cudaReadModeNormalizedFloat. My problem is sometimes I will have textures in A8R8G8B8, A2R10G10B10 or A16R16G16B16 execute the same code. So if I declare my texture with
texture<uchar4, 2, cudaReadModeNormalizedFloat> rgbaTex;
then A16R16G16B16 won’t work. I could declare another one with ushort4, but then I’d have duplicate kernels, one with 32bits textures and another one for 64bits textures.
Is there anyway I can have something like this?
texture diffuse;
void Run32bits(const unsigned char* ptexData, ...)
{
texture<uchar4, 2, cudaReadModeNormalizedFloat> rgbaTex;
//Load texture data to rgbaTex
//Set rgbaTex to global diffuse texture
//Execute kernel
myKernel<<< dimGrid, dimBlock >>>();
}
void Run64bits(const unsigned short* ptexData, ...)
{
texture<ushort4, 2, cudaReadModeNormalizedFloat> rgbaTex;
//Load texture data to rgbaTex
//Set rgbaTex to global diffuse texture
//Execute kernel
myKernel<<< dimGrid, dimBlock >>>();
}
__global__ void myKernel()
{
float4 pixel = tex2D(diffuse, x, y);
}
Thank you