In MMA coding, we usually define the matrix A
like this:
uint MultiA[2] = { 0 };
half* test1 = reinterpret_cast<half*>(MultiA);
The test1
pointer holds 4 consecutive half
values.
If I want to directly read 4 consecutive values from shared memory (smem
) and store them back into uint MultiA
, how can I perform this conversion correctly?
What is the exact problem you are facing? Do you have conversion code which does not work? Can you show it?
1 Like
I am loading data from smem. Now I achieved it! This is what I want to do(I show it because I think it is a general question):
half test1[4];
uint* MultiA = reinterpret_cast<uint*>(test1);
load test1[0] [1] [2] [3];
asm volatile("mma.sync.aligned.m8n8k4.col.col.f16.f16.f16.f16 "
"{ %0, %1, %2, %3 },"
"{ %4, %5 },"
"{ %6, %7 },"
"{ %8, %9, %10, %11 };\n"
: "=r"(out[0]), "=r"(out[1]), "=r"(out[2]), "=r"(out[3])
: "r"(MultiA[0]), "r"(MultiA[1]),
"r"(MultiB[0]), "r"(MultiB[1]),
"r"(out[0]), "r"(out[1]), "r"(out[2]), "r"(out[3]));
__shared__ half smem[];
//if smem ptr is aligned to 8 bytes
unsigned int multiA_uint[2];
uint2 tmp = *reinterpret_cast<uint2*>(&smem[0]);
memcpy(&multiA_uint[0], &tmp, sizeof(uint2));
//if smem ptr is aligned to 2 bytes
half tmp[4];
tmp[0] = smem[0];
tmp[1] = smem[1];
tmp[2] = smem[2];
tmp[3] = smem[3];
memcpy(&multiA_uint[0], &tmp, sizeof(uint2));
1 Like
system
Closed
December 9, 2024, 8:45am
5
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.