How to Hide OpenCL Source Code

Hi,
Currently, there is no way to hide OpenCL kernel source code when distributing apps, right?
Thanks,
Val.

You can encrypt the sources in any way you like, provided you can decrypt them back in the app. You can also obfuscate.

Or build the code into many device-specific binaries and attempt to load the proper ones at runtime by querying device name and/or vendor.

The “binary” returned from NVidia’s implementation is a PTX assembly text file. I’m not a PTX expert, but this PTX assembly should then compile fine to any NVidia device. The Apple OpenCL implementation returns something similar to PTX for NVidia devices as well. Though I have not looked at the binary from a CPU implementation or AMD’s implementation so I’m not sure about them.

The encrypt/decrypt method would be the most robust for cross platform support. Though only marginally better for securing your source code. To crack your source code I would simply load the application in a debugger with a breakpoint at clCreateProgramWithSource, then look at the strings argument being passed to that function.

I’m curious what you guys decide to do since we are going to have the same problem before we can ship our application.

Thanks,

Brian

I tried ptx as well, but the ptx file also includes the target compute capability, you can’t load a ptx for 1.3 on a 1.2 GPU. So, using “binaries” would create a maintenance headache, but that could be acceptable, since this way you can control on which platforms your app will be used and so avoid under-achieving implementations.
Jan

It’s probably even simpler to write a wrapper dll which transfer all calls to a real opencl.dll, then intercept the clCreateProgramWithSource call :P

I think that some sort of source code scrambler may help here. Although, it won’t achieve the same effect as using real binaries. Also there is always risks that source code scrambler may change the program’s behavior.

I know, that is an old topic, but problem is still there. I would like to make closed application for my “game engine” (it’s too loud call for that, because it’s not even in alpha stage) - radiosity lightmap generator (global illumination lightmap compiler if you want) for static lights in game levels. Using CUDA isn’t very good idea, because I don’t know what device will be used for computing all that stuff, maybe NVIDIA GPU, ATI GPU, or just x86 CPU with SEE3 support (OpenCL is more crossplatform in this case).

So how I can hide source code of kernels? Encryption is one of variants, but it could be simply hooked when calling function to create program from source. Another way is to make a lot of PTX binary files (or how it called) and then just load them. So how I must decide for which ones I must create PTX files? One per vendor and type of device (eg, “rad.nvidia.gpu.ptx”, “rad.ati.gpu.ptx”, “rad.amd.cpu.ptx”), or one per device (eg, “rad.GeForceGTS250.ptx”, “rad.GeForce9400GT.ptx”, “rad.GeForceGTX470.ptx”, …, “rad.RadeonHD4600.ptx”, “rad.RadeonHD5770.ptx”, …, “rad.CPUwSSE3.ptx”, etc)?

Or there another, easier way to classify PTX per architecture? Let’s say “rad.sm_10.ptx”, “rad.sm_11.ptx”, “rad.sm_12.ptx”, “rad.sm_13.ptx”, “rad.sm_20.ptx”, …, etc. I think this way is “more right” than previous ones. But where I can get information about arch (eg, sm_11 or sm_20) for NVIDIA and ATI GPUs? And how architectures will be marked on ATI cards, same “sm_$x$y” way or another, and what about AMD CPUs?

P.S. I know that I talk too much, and my English wish to be better, so… Don’t judge me, just help if you can… :">

I know, that is an old topic, but problem is still there. I would like to make closed application for my “game engine” (it’s too loud call for that, because it’s not even in alpha stage) - radiosity lightmap generator (global illumination lightmap compiler if you want) for static lights in game levels. Using CUDA isn’t very good idea, because I don’t know what device will be used for computing all that stuff, maybe NVIDIA GPU, ATI GPU, or just x86 CPU with SEE3 support (OpenCL is more crossplatform in this case).

So how I can hide source code of kernels? Encryption is one of variants, but it could be simply hooked when calling function to create program from source. Another way is to make a lot of PTX binary files (or how it called) and then just load them. So how I must decide for which ones I must create PTX files? One per vendor and type of device (eg, “rad.nvidia.gpu.ptx”, “rad.ati.gpu.ptx”, “rad.amd.cpu.ptx”), or one per device (eg, “rad.GeForceGTS250.ptx”, “rad.GeForce9400GT.ptx”, “rad.GeForceGTX470.ptx”, …, “rad.RadeonHD4600.ptx”, “rad.RadeonHD5770.ptx”, …, “rad.CPUwSSE3.ptx”, etc)?

Or there another, easier way to classify PTX per architecture? Let’s say “rad.sm_10.ptx”, “rad.sm_11.ptx”, “rad.sm_12.ptx”, “rad.sm_13.ptx”, “rad.sm_20.ptx”, …, etc. I think this way is “more right” than previous ones. But where I can get information about arch (eg, sm_11 or sm_20) for NVIDIA and ATI GPUs? And how architectures will be marked on ATI cards, same “sm_$x$y” way or another, and what about AMD CPUs?

P.S. I know that I talk too much, and my English wish to be better, so… Don’t judge me, just help if you can… :">

PTX is nvidia’s format, you’re thinking of a more general binary format. SM versions are also nv specific.
Creating dozens of binaries is the safest way, there’s no way for the user to get the source code because it’s not there. Any kind of encryption can be easily circumvented by a determined hacker - the unencrypted code must at some point be fed to the compiler, which sits in the OpenCL runtime. All it takes is to create a mock OpenCL dll that will log whatever is passed to clCreateProgramWithSource.

On the other hand you have to figure out how to get the right binary for the device. This may not be trivial, device information that can be queried from OpenCL doesn’t contain the binary format. Making it portable across future devices may be a pain.

I know some games compile shaders at first launch so they must ship with sources. They probably assume the risk of someone hacking the shader compiler and stealing their precious code is not worth the cost of shipping binaries and supporting them.

PTX is nvidia’s format, you’re thinking of a more general binary format. SM versions are also nv specific.
Creating dozens of binaries is the safest way, there’s no way for the user to get the source code because it’s not there. Any kind of encryption can be easily circumvented by a determined hacker - the unencrypted code must at some point be fed to the compiler, which sits in the OpenCL runtime. All it takes is to create a mock OpenCL dll that will log whatever is passed to clCreateProgramWithSource.

On the other hand you have to figure out how to get the right binary for the device. This may not be trivial, device information that can be queried from OpenCL doesn’t contain the binary format. Making it portable across future devices may be a pain.

I know some games compile shaders at first launch so they must ship with sources. They probably assume the risk of someone hacking the shader compiler and stealing their precious code is not worth the cost of shipping binaries and supporting them.

I think you better need to use cuda for nv cards, and binary opencl for radeon 57xx, any way older radeons will not perform well. And cpu implementation. Or better use cuda for nv and cpu for amd, cause I doubt that you get goot speed up on this task on radeons. And consider direct compute.

Without the code of the graphics engine the shader code is pretty useless in any case. And I think this is similar for OpenCL, in most cases.