Nvcc compiler and OpenCL kernel

I read that nvcc compiler can understand and compile OpenCL kernel with the parameter -lOpenCL

Despite this I get error trying to execute nvcc -arch=sm_50 -o sad.bin sad.cl -lOpenCL

nvcc fatal : Don’t know what to do with 'sad.cl’

So, Does nvcc compiler understand OpenCL kernel code?

I’ve never read or heard that (but there may be something I don’t know, of course).

nvcc can compile modules that “contain” OpenCL kernel code but the only way I’ve ever seen or heard of that working is in the same way that g++ (or cl.exe on windows) can do so - using an OpenCL library call to “handle” the kernel code.

Here is an example of what I mean. (I’m pretty sure that is not what you mean.) In that case the filename extension is .cpp, not .cl. Here is a similar example showing how you can “force” nvcc to “look in” a .cl file. But again, without knowing what you expect the contents of the .cl file to look like, that may not be what you had in mind. I would not say in either of these cases that the nvcc compiler “understands” OpenCL kernel code.

As the error message indicates, by default nvcc won’t look in a file with an extension of .cl for anything (neither would g++). Furthermore, the -l switch is a direction to the linker, (just as it is for g++).

In the early days NVIDIA explicitly stated that it did not provide a standalone compiler for OpenCL code. Some things have changed since that description (for example now I am pretty sure that NVIDIA’s OpenCL library maintains a JIT cache) but I’ve never heard of NVIDIA providing a standalone compiler for OpenCL kernel code.

AFAIK people who want to inspect PTX associated with NVIDIA OpenCL implementation do something like this (which isn’t using/doesn’t need nvcc).

Thanks for the reply. I already know how to compile the CL code just-in-time (JIT) but I want to discover-use the offline method.

According to the post below, this is possible with nvcc but the article is about 4 years old, so I thought that this was an option (if ever existed) that currently deprecated.

The offline compilation as an idea is real and exists but the question is, what tool should I use for that? I don’t have any other OpenCL compiler.

Reading page 68 of the book The OpenCL Programming Book it says that (it includes also an offline example)

In “offline-compilation”, the kernel is pre-built using an OpenCL compiler, and the generated > binary is what gets loaded using the OpenCL API. Since the kernel binary is already built, the time lag between starting the host code and the kernel getting executed is negligible. The problem with this method is that in order to execute the program on various platforms, multiple > kernel binaries must be included, thus increasing the size of the executable file.

List 4.7: Offline compilation - Reading the kernel binary

036: fp = fopen(fileName, "r");
037: if (!fp) {
038: fprintf(stderr, "Failed to load kernel.¥n");
039: exit(1);
040: }
041: binary_buf = (char *)malloc(MAX_BINARY_SIZE);
042: binary_size = fread(binary_buf, 1, MAX_BINARY_SIZE, fp);
043: fclose(fp);

That’s not what that post/answer says. That post (i.e. the answer) is not doing anything like “offline compilation” and no functionality in that post is deprecated AFAIK. The answer there should continue to work fine with today’s nvcc, but it is not doing offline compilation as described in chapter 4 of the book you reference, which seems to be available here. Furthermore, the offile compilation example described in that chapter assumes that the kernel binary is already compiled and stored on a disk file. Other than block diagrams, it gives no hint of what tools to use, that I can see, to do the offline compilation portion.

I’m not aware of any tool from NVIDIA that does offline compilation for OpenCL kernel code. The relevant quote from the chapter 4 you reference is here:

In fact, a stand-alone OpenCL compiler is not available for the OpenCL environment by NVIDIA, AMD, and Apple. Hence, in order to create a kernel binary in these environments, the built kernels has to be written to a file during runtime by the host program.

(emphasis added)

I’m not aware of any change to that statement as far as NVIDIA is concerned.

However, the suggestion there might be possible. You could use the online version code provided there, perhaps up to (but not including) the call of clCreateKernel, then find a way to write the built program to disk. Then, as in the offline version, load that from disk and call clCreateKernel and continue. I wouldn’t be able to explain how to do that fully, however. The method doesn’t seem that it would be in any way unique or specific to the NVIDIA toolchain.

You could ask a nice targetted question in an OpenCL forum like

“After the call to clBuildProgram, how do I write the built program to a disk file?”

Again, doesn’t seem to be in any way specific to NVIDIA. But once you have that info, you should be able to try doing “offline compilation” as I described.

Thank you Robert,

I am pretty sure that you have right. For clarification purposes, the post I mentioned, was not the book itself, but the link at the stackoverflow, from your own answer, as I just now realized this is your name.

Somewhere in the middle of the accepted answer, you can find the statement below. Please notice the last parameter -lOpenCL. That’s why I thought nvcc could compile a CL kernel program. So, this parameter, what is standing for?

$ nvcc -arch=sm_35 -o t4 t4.cpp -lOpenCL

The -l switch for nvcc behaves in the same way as the -l switch for g++. It specifies a specific library to include in the link phase. When you compile an OpenCL “program”, you may find calls to the cl API in that program. For example clBuildProgram() is a call to the openCL API. The library that provides this API is indicated by -lOpenCL. It instructs the compiler to link to a library named libOpenCL.so on linux.

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#file-and-path-specifications-library

It doesn’t have anything to do with an ability to compile OpenCL device code, nor does it have anything to do with compilation, it is a link-time specification.

1 Like