[Solved]SASS Code Analysis

nvidia doesn’t officially disclose sass details, and moreover - it changes in every major SM generation. the way i learned sass is

  1. ptx manual: http://docs.nvidia.com/cuda/parallel-thread-execution/
  2. http://docs.nvidia.com/cuda/cuda-binary-utilities/#instruction-set-ref
  3. read wiki of asfermi project: Google Code Archive - Long-term storage for Google Code Project Hosting.
  4. read manual of kepler sass: https://hpc.aliyun.com/doc/keplerAssemblerUserGuide
  5. there is also maxas, but its docs doesn’t describe commands

in your code, r6.cc means “write carry to 1-bit CC register”, and mad.hi.x computes high 32 bits of result and adds carry from CC register. LD.E is a load from global memory using 64-bit address in R6,R7. the entire code is

R6 = R3R5+c[0x0][0x20], saving carry to CC
R7 = (R3
R5+c[0x0][0x24])>>32 + CC
R2 = *(R7<<32+R6)

first two commands multiply two 32-bit values (R3 and R5) and add 64-bit value c[0x0][0x24]<<32+c[0x0][0x20], leaving 64-bit result in the R6,R7 pair

c[BANK][ADDR] is a constant memory, c[0x0][0x20] is the first kernel parameter, so the entire code is:

kernel f (uint32* x) // 64-bit pointer
{
R2 = x[R3*R5]
}

unfortunately, there are no much books with low-level GPU details. the best i have seen is http://www.cudahandbook.com/ , in particular it describes those c references

8.1.4 CONSTANT MEMORY
Constant memory resides in device memory, but it is backed by a different,
read-only cache that is optimized to broadcast the results of read requests to
threads that all reference the same memory location. Each SM contains a small,
latency-optimized cache for purposes of servicing these read requests. Making
the memory (and the cache) read-only simplifies cache management, since the
hardware has no need to implement write-back policies to deal with memory
that has been updated.

Two more books going into low-level details are:
Shane Cook “CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs”
Rob Farber “CUDA Application Design and Development”

1 Like