Attempting to write my own OpenCL ICD loader-- libcuda.so doesn't export OpenCL API libcuda.so p

This is just a re-post of my thread on the nvnews forums, after I was told I’d probably get better help if I asked here instead.

This isn’t a user question, it’s a developer question. I already know that OpenCL works fine on my system because I’ve successfully run some simple test apps. I’m just having trouble writing my own OpenCL code using nVidia’s drivers. Dunno if this is the right place to go for developer help, but I’ll ask here anyway so I don’t get flamed if I ask on the Knronos forums. External Image

I am trying to implement my own OpenCL ICD Loader (implementing this spec.) The use case is that it will be statically linked with an application and will detect any OpenCL ICDs installed on the system.

According to the spec (at least, as I understand it,) /etc/OpenCL/vendors/*.icd is a file with the name of a .so; the .so will provide an OpenCL implementation which the ICD Loader will load. In this case, I have /etc/OpenCl/vendors/nvidia.icd, which references “libcuda.so,” which I can open just fine with dlopen(). I can get clGetExtensionFunctionAddress and clGetPlatformInfo from libcuda.so using dlsym(), and I can then use clGetExtensionFunctionAddress to retrieve clIcdGetPlatformIDsKHR. I can then use clIcdGetPlatformIDsKHR and clGetPlatformInfo to query all platforms provided by the driver (all one of them) and verify that they all support the cl_khr_icd extension. So far it’s going exactly like the spec says it should.

However, libcuda.so does not seem to export any of the rest of the OpenCL API- clGetDeviceIDs, clGetDeviceInfo, etc. etc. etc.-- dlsym fails on all of them. In addition, I cannot get any of them with clGetExtensionFunctionAddress either, I’ve tried.

Am I misunderstanding the spec? Should I be looking elsewhere for these functions? If any nVidia developers are reading this, would they mind disclosing how nVidia’s ICD loader does this?

For now, I’m just having it fall-back to libOpenCL.so if it detects an invalid ICD, and I have it working correctly that way, but I’d like to get it working this way as well (it is an ICD loader after all, not just a wrapper.)

Ceremonial information:

    [*]32-bit Linux Mint 9 (Ubuntu 10.04)

    [*]Driver 195.36.24 from nvidia-current package

Browse source code:

https://github.com/Max-E/libclicd

Zipped format download:

https://github.com/Max-E/libclicd/zipball/master

The part you are probably most interested in is clicd_locateicd_unix.c, which contains all the code being discussed here.

Pure ANSI C, includes README, Makefile, and test applications, no dependencies. “make testapps” to compile it, then you can “cd testapps” and run the three tests:

    [*]list_platforms

    [*]part1.x

    [*]part2.x

It should also be trivially easy to compile on OS X, but I have not tested it there yet. For Windows, you would need to add code to look for registry keys to find the ICDs, as described by the spec.

The core function pointers come attached to various objects that are passed to the ICD loader through the OpenCL API as parameters rather than being exported by name from the ICD. The only symbols that are exported are those necessary to bootstrap or support extensions.

See the “Sample Code” section of the spec where an implementation of clCreateCommandQueue is shown (uses the dispatch table from the context parameter). Unfortunately, the spec does not contain enough information to implement an ICD loader. You also need a complete definition of the _cl_icd_dispatch struct which defines the order of the function pointers.

Ah. I thought the ICD Loader was to provide its own containers around the “opaque” types, like this:

struct _cl_mem {

struct _cl_icd_dispatch *dispatch;

opaque_cl_mem data;

int refcount;

};

typedef struct *_cl_mem cl_mem;

Given enough time, I could probably reverse-engineer the order of the struct in libcuda.so using strace/ltrace/latrace/possibly some custom code using ptrace. However, from what you are saying, this may be a waste of time, because there’s no way to guarantee that it will load everyone’s ICD (i.e. the order in libcuda.so might be different from what it is in AMD’s driver.) The spec mentions that Khronos members have access to the source code of an ICD loader. Is it reasonable to assume that all ICD loaders “in the wild” are derived from this codebase? Can anyone actually confirm that the ICD loaders from different vendors can interoperate? (I only have access to nVidia hardware at the moment…)

Using GDB, I’ve determined that the opaque pointers I’m getting out of clIcdGetPlatformIDsKHR actually point to vast swaths of zeroed memory, and I need to go 1022 bytes or so before there is anything there…

I use OpenCL on a machine with both an AMD ICD and a NVIDIA ICD and two platforms are presented, so a single ICD loader works for both. I know some early versions of drivers had interoperability problems, but I believe those issues have been resolved in current drivers. I don’t remember what version that was but it works using current drivers from NVIDIA and AMD on Windows.

The order of the functions is the same for all ICDs, otherwise an ICD loader would not be able to call the functions from an arbitrary ICD (which is the entire purpose of the ICD mechanism). As you say it wouldn’t be too hard to determine to order by watching the calls through a working ICD loader.

The pointers that come out of an ICD through clIcdGetPlatformIDsKHR should point to a _cl_icd_dispatch pointer. I can’t explain the behavior if that is not what you see.