GPU assembly

Are there any tutorials, manuals or other material about GPU programming in assembly?
I’d be interested to get the idea how it works.
How does the communication between the GPU and host system work?

I understand that the assembly and architecture of different GPUs are quite different, but
I’d still like to see how it goes from the bare iron programmer’s P.O.V.
I guess any common GPU would do.

The only GPGPU-capable (CUDA 1.1) NVIDIA thing I own at the moment is old
NVIDIA GeForce 9300 GE (Lumenex).
It would be nice if there is something about GPU at least somewhat close to that so
I could maybe clown around with it a little, but other GPUs are fine too.

Cuda, OpenCL etc material is easy to find, but I haven’t found anything about the assembly.


Funny that there seems to be prettu much nothing about GeForce 9300 GE in english, but in some other languages there is (too bad I only read finnish, english and sweedish well enough).

The closest that you can easily get to assembly on NVIDIA GPUs is PTX, which is a virtual assembly language that is compiled by the CUDA driver to the machine code of your GPU before execution. There is a manual in the CUDA toolkit about PTX.

OK. Thanks.

The CUDA Binary Utilities document has a list of the assembly instructions for Compute Capability 1.2 and above.

The Parallel Thread Execution ISA Version 3.2 (PTX) has information on the PTX intermediate language which has a very close mapping to the final assembly instructions.

The best approach for learning how the GPU works is to use the Nsight VSE CUDA debugger and cuda-gdb and single step the assembly for different programs. If you are not set up to debug then simply writing small sample programs and using cuobjdump or nvdisasm to list the PTX and SASS (assembly) is fairly easy way to learn.