OpenCL Binary Reuse At what level can binaries be reused?

Hi All,

I’m working on a project with the ultimate goal of having OpenCL code run on a variety of devices and platforms. Currently, I’ve got it set up to create binary files for kernels as kind of a cache for future runs of the project. Now that this works, I am considering using this functionality to avoid distributing the kernel source altogether. To do so, I would need to create multiple binary files which I would then distribute in place of the OpenCL source file.

My question is, at what level can I expect to be able to reuse the binaries that I create? That is, do I have to create individual binaries for each device the project supports? Or can I do it at the platform level? Could I create a binary for say the NVIDIA platform and expect it to run on any device which is supported by the NVIDIA plaform? I currently have my code producing binaries at the platform level. It works on my computer, but that doesn’t say a whole lot as no platform used on my computer supports more than one device on my computer.

Like the OpenCL specification states, the binary can be either or both a device-specific executable or an implementation-specific intermediate representation (IR). The latter means that, in the worst case, a new driver version for the same device might choose to use a different IR, incompatible to previous ones.

The clCreateProgramWithBinary() API is not primarily meant to protect IP by only distributing binary code, but to “[…] be queried and cached by the application. Future instances of the application launching will no longer need to compile and build the program executables. The cached executables can be read and loaded by the application, which can help significantly reduce the application initialization time”.

Thanks eyebex.

After posing the question, I decided to try it out on another machine that was similar, but not identical. The NVIDIA kernel worked fine on a different device, but despite both computers having Intel Xeon processors, the Intel binary did not work and needed to be recompiled. So platform based compiling is clearly not reliable enough for this purpose.

I have to wonder though. I can’t be the only one interested in protecting the kernel source. How else can this be done? I’ve got some ideas, but they’re frankly pretty convoluted.

I also spent a little time looking one of the questions you’ve raised. Specifically, how would one compile and load binaries for OpenCL, and how well does it work.

There is a tool called clcc, available at http://clcc.sourceforge.net/ that uses clCreateProgramWithSource() and clBuildProgram() to compile the source. The developer never implemented code to save the binaries to a file, but I extended it to do that.

Just as you found, binaries produced by the clBuildProgram are platform dependent. In fact, it’s probably all platform/device/version/vendor/driver specific in the worse case. OpenCL AMD devices have binaries that are Elf object modules. OpenCL NVIDIA devices have binaries that are PTX assembly code.

NVIDIA binaries generally work on different NVIDIA devices because it is PTX, which is JIT compiled to the actually GPU instructions when the program is run. However, that won’t always work because some GPU’s can’t execute newer targets like sm_20, which is specified in the “.target” declaration in the PTX.

BTW, have you considered embedding a encrypted version of the kernel source in your program? You can then decrypt it and pass that to clCreateProgramWithSource(). That could help protect the kernel source code.

(If you want a copy of the program and reader, send me email. But, it needs more work to handle multiple different OpenCL compiler options. That’s because different platforms allow different options, e.g., -cl-nv-arch is not valid for AMD.)

Ken

I was about to ask whether you have contributed your patch to upstream, but it seems some one (else?) has already done that:

http://sourceforge.net/tracker/?func=detail&aid=3065218&group_id=289330&atid=1225116

Or is that actually your alter ego? :-)

We’re basically having the same issue for years with shaders, and no real solution yet. Encrypting your shader / kernel in the binary and decrypting it again just before passing it to the OpenGL / OpenCL API also is trivial to bypass with any of the freely available API hooking / debugging tools.

What we’d actually need is a secure / encrypted API, but that’s hard to design if you know that the communication channel is not secure, so any key exchange or the like before the actual data transfer can also be hooked.

Ugh. No, not me. Thanks for pointing it out. But, that patch doesn’t work very well because:

  • it hard codes the platform to the first one available;

  • it can’t work if there is more than one device in that platform, as it overwrites the .o file for the previous device;

  • can’t work for me because there isn’t any mechanism to pass in compiler options, because this NVIDIA compiler is has a bug that I have to work around with by compiling with -cl-opt-disable.

Encoding multiple binaries needs some thought. Either you have to output one binary to a unique file and have the file name or contents encode the platform/device for this binary (so you can use the right one for your device), or one file with the contents encoding the platform/device information for each binary. I chose the later, but who knows.

Yep, hooking is an issue.

It just seems that OpenCL needs more work. E.g., how do I do a cross compile?

Would you mind posting your patch to the tracker at SourceForge then, too, so it’s publicly available?

Thanks in advance!

Thanks guys. Its at least a little comforting that I’m not the only one running into this problem.

I am surprised that this is not something that was considered in the OpenCL spec. I feel like this will be a huge deterrent for commercial use of OpenCL.

Anyway, I’ve come up with an (admittedly very simple) encryption method for .cl files. I figure embedding the encrypted source in the binary will deter most inquisitive users. Of course, someone truly motivated will be able to figure it out without too much hassle. However, the same could be said about decompiling standard binaries.