jafioti
September 29, 2023, 7:17pm
1
I have two CUDA kernels:
Kernel A:

```
__half sum = __float2half(0.0);
for (int c_ = 0; c_ < 2100; c_++) {
sum = sum + __float2half(1.0);
}
out = sum;
```

Kernel B:

```
float sum = 0.0;
for (int c_ = 0; c_ < 2100; c_++) {
sum = sum + 1.0;
}
out = __float2half(sum);
```

The first kernel results in an output of 2048, which is incorrect. The second kernel results in 2100, which is correct. Do you know why that might be?

My card is a 2080 super, running cuda 12.2 driver version 535.104.05

2048 as 16-bit float has the bit representation `0 11010 0000000000`

. The next integer value that can be represented is `0 11010 0000000001`

which is 2050. 2048 +1 is probably rounded down to 2048

jafioti
September 29, 2023, 8:01pm
3
I see, what would be the recommended way of summing past 2048 then?

Use a different datatype if you need a contiguous range of integers greater than 2048.

1 Like

system
Closed
October 13, 2023, 8:03pm
5
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.