How to compile OpenCL code into binary for a GPU I do not physically have?

We are a small company developing image processing software. We use OpenCL for GPU acceleration (single code base for Nvidia, AMD and Intel GPUs), and we deploy our OpenCL kernels as compiled binary, one per each line of GPUs that requires separate compilation. In practice this means that with every new major line of Nvidia GPUs we have to buy one model from this line in order to obtain a binary for it. I think it’s quite understandable that we don’t want to include the source code for our proprietary algorithms into our commercial product.

So my question is: does Nvidia offer any way to install the latest driver and obtain the binary for a GPU that the driver knows about, but I don’t physically have in my system? E. g. I have RTX3090, but I want to compile for RTX4090. AMD supports this via the CL_CONTEXT_OFFLINE_DEVICES_AMD extension.

I’m not aware of any method to compile for a non-existent device using standard OpenCL toolchain, nor any extension provided by NVIDIA to do that.

You can always file a bug (a feature request, basically). I don’t know if the development team would tackle such a project, or not.

It depends on what you call/define as a “compiled binary”.

It seems to be possible to use nvcc compiler to compile opencl/c based code.

If this is indeed so then perhaps using the following command line parameters will compile the kernel to ptx for a certain machine bitness and also a certain architecture.

Here is an example:
-ptx --machine=64 -arch=sm_53

Basically each GPU is part of a certain architecture.

So compiling to each architecture should allow to do what you want.

(ptx is a virtual instruction set/virtual program and could be considered the binary you seek)

The cuda runtime and/or driver can execute ptx and translate/compile it further into gpu-specific instructions… sass if I remember correctly.

Good luck !

I don’t think that is possible with any recent version of CUDA.

Any attempt to compile any code that includes OpenCL specific syntax like __kernel results in syntax errors.

If you have a counter-example, please provide it.

Have you looked at
CL_PROGRAM_BINARIES query of clGetProgramInfo
You should be able to run your apps on NVIDIA OpenCL on any supported NVIDIA GPU and use query above to get the binary for the kernel.
You can pass this later to clCreateProgramWithBinary
Hope this solves your problem

That is exactly what we’ve been doing for many years. The problem is that you get the binary for your selected GPU (the one you created the OpenCL context for), and you cannot select a GPU that’s not physically present. Or can you?

A quick google shows some examples,

CUDA 8.0.61 is mentioned.

This is different. It compiles a C/C++ program that uses OpenCL, and the program, in turn, compiles the OpenCL code at runtime via clCreateProgramWithSource. It’s not clear to me why nvcc is even needed for that instead of regular gcc, maybe a Linux thing. I only need it to work on Windows, btw.

nvcc isn’t compiling opencl device code in that example.

That example is irrelevant for the discussion here.

What I do know about OpenCL is that it can be compiled to PTX.

Here is a link describing it, it seems to use GCC:

https://arrayfire.com/blog/generating-ptx-files-from-opencl-code/

Perhaps GCC has compile directives to target certain architectures of nvidia graphics cards ?