It is not clear to me the difference between **half2** and **half**.

Using **half2** instead of **half** help to improve the speed of computation operation? Is it for all cases or just a specific kind of operation?

Can I replace **half** with **half2** without any other modification and get better performance?

`half`

is single 16-bit floating point quantity/type.

`half2`

is a *vector type*, consisting of two 16-bit floating point quantities packed into a single 32-bit type.

They are not interchangeable. You cannot expect to simply replace `half`

by `half2`

with no other modifications. In most cases if you did that, your code will not compile properly.

The hardware has a number of resources that provide 16-bit float processing (e.g. add, subtract, multiply, etc.) by doing it on a `half2`

type. This generally results in the fastest/most efficient use of the machine.

From a machine storage standpoint, there is no distinction (in terms of the way memory data appears) between storage of a properly aligned array of `half`

and an equivalent, half-as-long array of `half2`

.

Thanks Robert. So for example for this function:

`__device__ __half2 __h2div ( const __half2 a, const __half2 b )`

Is **a** refering to 2 half variable by itself? or in another meaning here we are working with a vector of lenght 2 which is divisible to 2 part? How is the driver from CPU side to load the half2 data?

Can I say that for dividing **a** vector by a scalar in half 2 precision, I have to just work with odd index of **a[i]**?

You may need to get familiar with vector types.

A “half2 variable” is going to look like this:

```
| 1.326 | 1.544 |
|<16bits>|<16bits>| two half quantities take 16 bits each
|< 32 bits >| a single half2 quantity takes 32 bits
```

`a`

would be referring to data stored in memory just as I have depicted. It would contain two quantities like 1.326, 1.544 So I would say yes, a refers to 2 half variables “by itself”.

I have no idea what that means. Not sure what driver on the CPU you are referring to.

If you want to divide a by a scalar, it could have the meaning of dividing one or the other of the two quantities, or it could have the meaning of dividing both quantities. It depends how you define the function.

For the h2div function prototype you have shown, it takes two half2 quantities and returns a half2 quantity, so a sensible realization could be *elementwise* division:

```
| 1.326 | 1.544 | a
| 1.100 | 1.200 | b
| 1.205 | 1.287 | elementwise result of a/b
```

This type of “half2 arithmetic” is what the device is optimized to do. In the time that you can add, subtract, or multiply a half quantity by another half quantity, on devices that support half2 arithmetic you can do that operation elementwise on half2 quantities, in the same amount of time.