If I write beside code:
__device__ f() {
half A[64];
......
}
will compiler automatically pack these registers into 32 registers? If don’t, how can I do that?
If I write beside code:
__device__ f() {
half A[64];
......
}
will compiler automatically pack these registers into 32 registers? If don’t, how can I do that?
I need these registers for mma calculation, so I need to declarate the pointer with half* type.
I would just use half2 A[32]
.
If you just feed the data into MMA (and calculate it elsewhere), you could even use unsigned int
.
Not fully standard C++ (UB), but seemingly mostly accepted for device code, you could also use a
union {
half h[2];
unsigned int ui;
half2 h2;
}