Fastest way to convert float4 to uchar4? Texture conversion

Hello there,

what is the fastest way to convert an array of float4 in the range 0-1 to an array of uchars? Meanwhile, I’m using this code to do it:

__global__ void convertTexFloatToUChar( uchar4* _dst, const float4* _src )


	const unsigned int idx = getTextureIndex();

	_dst[idx].x = (unsigned char)(_src[idx].x * 255.9999f);

	_dst[idx].y = (unsigned char)(_src[idx].y * 255.9999f);

	_dst[idx].z = (unsigned char)(_src[idx].z * 255.9999f);

	_dst[idx].w = (unsigned char)(_src[idx].w * 255.9999f);


Is there any better way to do it, maybe CUDA even provides a builtin function that’s optimized to do just that?



Try to inspect ptx from this function and check how compiler code it. Btw, it could not be your programm bottleneck.

Given that floating point multiplication is as fast as any bit operation on these devices, I don’t think you can do better than what you’ve written. Appendix C says that floating point truncation can be done in 1 instruction, so I’ll assuming the casting operation is fast as well.

Compiler needs to make solid memory reads and writes. If dest is in shared memory, it could generate 4 serialized accesses.
Btw, why function is global?