In the CUDA sum function I have done the operation of double type like
res = A * B - C.
The final result is stored in the array defined by
double* res.
Now I need to convert the data type of res to unsigned short
type,
how can I do it and is it possible to do it in CUDA core function? Thanks!
unsigned shorts have a limited range (0-65535)
Do you need to quantize the double into said range, or is it enough to round the double value to the nearest integer value between 0-65535?
For rounding a positive double in the range 0-65535 to the nearest integer this works:
short_value = (unsigned short)(double_value+0.5);
You could also clamp the double into the allowed range first to prevent an integer over/underflow.
Make sure that whatever you are converting is within the representable range of an unsigned short
, as values outside this range will result in an undefined result. I am mentioning this because I recall other people being tripped up by this and then wondering why they were getting different results on different platforms. This is nothing CUDA-specific, but straight from the C++ standard (see below), but I realize that not all CUDA programmers have extensive prior experience with C++.
6.3.1.4 Real floating and integer
When a finite value of real floating type is converted to an integer type other than_Bool
, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.⁶¹⁾
[…]
⁶¹⁾The remaindering operation performed when a value of integer type is converted to unsigned type need not be performed when a value of real floating type is converted to unsigned type. Thus, the range of portable real floating values is (-1;U
type_MAX
+ 1).
thank you, I just used double to expand the decimal representation range and ended up with a result that rounded the decimals according to the data I wanted. Thanks again.
Thank you.
I use as follow to convert double
to unsigned short
, I will try your success:
short_value = (unsigned short)(double_value);