Methods to prevent reverse engineering / hacking of CUDA kernels in application?

Is there any recommended best practices to harden kernels so they are resistant to RE or hacking ?
On OpenCL the kernels could be compiled and stripped of IR.

You can certainly strip CUDA kernels of “IR”, which would be PTX. There are numerous topics on forums that describe this, here is one.

Thanks. From this SO question it appears that nvcc -arch compute_XX -code sm_XX will prevent any PTX from making its way into the binaries. Where XX is a target compute capabilities version.

Yes, there are various compile switch combinations that will do it. That is one of them I believe. You can verify whether a particular object has PTX in it or not with the cuobjdump utility.

Also, note that PTX is the primary forward-compatibility mechanism in CUDA for device code. Removing PTX from your binaries restricts you to running on device architectures you have specifically called out via compile switches. You can get some additional examples here by studying the Makefiles in the CUDA sample code projects.

Two thoughts (omitting relevant war stories):

(1) Stripping symbols and removing IR deters are at best an annoyance to any determined attacker. It is the equivalent of putting a 4ft chain-link fence around one’s property in the hopes that this will deter potential burglars.

(2) The “secret sauce” in your application is quite likely already in use by most of the leading companies of whatever industry you are in.

Do you mean that the SASS can be used instead of PTX to RE the code ?

Reverse engineering code from machine instructions has been the normal way to reverse engineer for decades, and there are some pretty powerful tools for common processor architectures to help with that. People have even been able to reverse engineer microcode inside processors despite a near total lack of publicly available information.

The embedding of an intermediate representation into the executable binary (to overcome the lack of binary compatibility between processor architectures) is one of the more unusual features of CUDA.

an intermediate representation itself is not that unusual,

but embedding it in the binary is.

Intermediate representations are a standard feature of modern tool chains, so I particularly focused on the CUDA feature that IR is deposited into a binary executable in addition to a classical fat binary consisting of multiple types of machine language.

In terms of reverse engineering the way code is represented may be the difference between the 4 ft chainlink fence I mentioned and a solid 7 ft wall.

Thanks, can you elaborate on that?

The detailed behavior of machine instructions for the various GPU architectures is not publicly documented by NVIDIA, whereas PTX’s virtual ISA is documented in great detail in official documentation.

While many SASS instructions are readily understood by anyone who has practice in reading code at the assembly language level, others might require some reverse engineering effort to understand them in full detail. Generally speaking, this is not that hard to do. In the past century many processors had so called undocumented opcodes, and once these were found, people loved to find out what exactly these did. Some of these were quirky artifacts, others quite useful.

One of the biggest annoyances when reading and manually tracing the SASS for modern GPUs is the LOP3 instruction, because the mapping of logical operations to one of the 256 possible LOP3 modes is many-to-one. So it is a great obfuscator (which is obviously not why the instruction was added; it is great for performance). Any book on security will advise that security by obscurity is a flawed concept; trying to hide any “secret sauce” behind SASS falls into that category.

While reverse engineering at SASS level is harder (7ft solid wall; requires either a ladder to cross or a sledgehammer to break through) than reverse engineering at PTX level (4ft chain link fence; jump over it or use wire cutters), the difference in degree of difficulty is not dramatic. Back when I was actively working with the CUDA compiler team on code generation bugs and performance optimizations for a couple of years, I studied so much SASS code that I could read and translate back to C quite fluently.

1 Like

Thank you very much for these details !

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.