Dear experts,
I have a couple of questions on the machine code for atomic-add on the Pascal GPU. The code:
__global__ void Test(double* biubiu)
{
double x = 1.23;
atomicAdd(biubiu, x);
}
is compiled into:
code for sm_61
Function : _Z4TestPd
.headerflags @"EF_CUDA_SM61 EF_CUDA_PTX_SM(EF_CUDA_SM61)"
/* 0x001fc400fe2007f6 */
/*0008*/ MOV R1, c[0x0][0x20]; /* 0x4c98078000870001 */
/*0010*/ MOV R2, c[0x0][0x140]; /* 0x4c98078005070002 */
/*0018*/ MOV R3, c[0x0][0x144]; /* 0x4c98078005170003 */
/* 0x0003c400fe4007f1 */
/*0028*/ MOV32I R4, 0x7ae147ae; /* 0x0107ae147ae7f004 */
/*0030*/ MOV32I R5, 0x3ff3ae14; /* 0x0103ff3ae147f005 */
/*0038*/ RED.E.ADD.F64.RN [R2], R4; /* 0xebf9000000670204 */
/* 0x001f9c00fde007ef */
/*0048*/ NOP; /* 0x50b0000000070f00 */
/*0050*/ NOP; /* 0x50b0000000070f00 */
/*0058*/ NOP; /* 0x50b0000000070f00 */
/* 0x001f8000ffe007ff */
/*0068*/ EXIT; /* 0xe30000000007000f */
/*0070*/ BRA 0x70; /* 0xe2400fffff87000f */
/*0078*/ NOP; /* 0x50b0000000070f00 */
My question is:
- According to cuda binary utility manual, MOV32I has existed since Volta, but Iâm using the Pascal GPU. Is this simply an error in the manual?
- Does âEâ in RED.E refer to âexclusiveâ in the cache protocol?
- What does MOV32I mean exactly? 32: 32-bit, I: integer? Here I was using double-precision atomic-add. So why does it only move 32-bit data into R4 and then atomically add R4 to R2?
Thanks for your time!