Register in kernel

If I write beside code:

__device__ f() {
    half A[64];
    ......
}

will compiler automatically pack these registers into 32 registers? If don’t, how can I do that?

I need these registers for mma calculation, so I need to declarate the pointer with half* type.

I would just use half2 A[32].

If you just feed the data into MMA (and calculate it elsewhere), you could even use unsigned int.
Not fully standard C++ (UB), but seemingly mostly accepted for device code, you could also use a

union {
    half h[2];
    unsigned int ui;
    half2 h2;
}