OpenCL example code doesn't compile (CUDA 6.0 + Ubuntu 12.04.5)

I downloaded OpenCL example code for Linux from

Makefile doesn’t work. I figured I’ll have to do it manually and produce first common/ and shared/ But when I was trying to put together shared/, cmd_arg_reader.h references a customised exception.h which doesn’t exist anywhere. There was no way to make it work.

Would it be possible someone fix the OpenCL example code for Linux?

The last CUDA toolkit that included OpenCL samples was 4.2. They should work on that version and it should contain that missing header file. You might be able to fix the example codes you downloaded with the missing files from the 4.2 OpenCL samples.

@vacaloca I looked in targets/x86_64-linux/include and targets/x86_64-linux/include/CL, the closest file to exception.h was cl_ext.h under include/CL.

CUDA Toolkit itself is great but the OpenCL example code is hard to get to run on Linux.

I have a CUDA 6.5RC setup on linux (RHEL 5.5).

I downloaded the CUDA 3.2 SDK installer:

i.e. I downloaded this file:

to a local directory.

I then unzipped the file:


When prompted, I selected a local directory to unzip the files to, and it found the default path (/usr/local/cuda) to my cuda (6.5RC) install for me.

After unzipping the files was complete, I went into the OpenCL directory, and typed

make -k

The OpenCL samples were built successfully, and stored in the OpenCL/bin/linux/release directory.

From there (OpenCL/bin/linux/release), I could for example run



I think this should work in a similar fashion with the CUDA 4.2 SDK as well.

The toolkit archive page is here:

Thanks @txbob for the detailed instructions.

I guess OpenCL example code bundled with previous versions of CUDA Toolkit certainly work. However I stayed with CUDA 6.0 (for which the OpenCL include/ and lib/ both work) and the example code for Linux on

The thing is that shared/src/cmd_arg_reader.cpp involves a macro RUNTIME_EXCEPTION which should be declared in exception.h. It seems RUNTIME_EXCEPTION may be part of the Microsoft library C++ AMP (C++ Accelerated Massive Parallelism). But could you double-check @txbob do you have a RUNTIME_EXCEPTION referenced under shared/?

Also the Makefile under shared/ is missing. I made up one following all the others but it still doesn’t work (linking fails somewhere).

I’m sure we could prune away the small fraction of code that doesn’t compile/link and make the rest work. It’s just it already took me hours now…

I do hope NVIDIA could double-check everything before they put it online.

The OpenCL sample codes on the webpage you are pointing to are the same as what is in the SDK download that I suggested.

Those sample codes on the webpage are basically individual source directories of the SDK. As you’re discovering, they depend on a framework of libraries, utilities, and shared include files that are part of the SDK framework. There is no intent that they be build-able by themselves, which you’re also discovering. Those codes are provided to make the SDK easily browse-able on the web, and so that you can replace one of your codes if you modify it (within an existing SDK framework). They are not standalone examples.

You’re welcome to do whatever you wish, of course, but if you want to avoid spending hours, I assure you the easy way to build those codes is to download the SDK using the instructions I provided. The pre-CUDA5 SDK is independent of whatever toolkit you have installed.

@txbob well understood and thanks for your instructions. The only point is that if you somehow work for NVIDIA, could you put this disclaimer on

If I google for GPU Computing SDK, I get to

It’s classified under Home > CUDA ZONE > Tools & Ecosystem > Language & APIs > GPU Computing SDK

But if I go up one level Home > CUDA ZONE > Tools & Ecosystem > Language & APIs.

GPU Computing SDK is not listed there, which makes one to think that GPU Computing SDK is phased out?

However OpenCL is listed there. And the code downloaded from under OpenCL doesn’t work standalone.

==> All this reveals slight dis-consideration to the website and code organisation.

@txbob I have CUDA 7 on my system. Would that be a problem? Cause I did whatever you mentioned above and yet I keep getting undefined reference to some variables…

I’m pretty sure you can get it to work with CUDA 7.

As I have just come across this page, I thought I’d add some
info that may be useful for others that find it, as a couple
of the questions raised don;t have explicit answers here.

If you have the NVIDIA_GPU_Computing_SDK that @vacaloca and @txbob
refer to, ie

then the “missing” exception.h file exists in three places

5035 Aug 16 14:26 ./shared/inc/exception.h
5035 Aug 16 14:26 ./CUDALibraries/common/inc/exception.h
5035 Aug 16 14:26 ./C/common/inc/exception.h

FWIW, we have CUDA Version 8.0.44 (wrapping GCC 5.4.0)
running on ArchLinux systems here and I could get, for example,
the oclMarchingCubes sample to build and run if I did the
following from the top-level of the SDK
(Might not actually need the first one, but for completeness)

cd C/common

cd ../../shared
  BUILDS lib/libshrutil_x86_64.a

cd ../OpenCL/common/
  BUILDS lib/liboclUtil_x86_64.a

cd ../src/oclMarchingCubes/
make verbose=1 2>&1 | tee /tmpmake.out

Where I did see an issue was when running say, the Nbody example,
where attempts to build the PTX file fail because of the complier
not being able to recognise the correct calling signature of parts
of some macros, giving one this error message at run time:

Build Log:
<kernel>:86:44: error: call to 'mul24' is ambiguous
        accel = bodyBodyInteraction(accel, SX(i++), myPos, softeningSquared); 
<kernel>:28:29: note: expanded from macro 'SX'
#define SX(i) sharedPos[i + mul24(get_local_size(0), get_local_id(1))]

The fix for this appears to be in the code, in that, if one looks at

// Macros to simplify shared memory addressing
#define SX(i) sharedPos[i + mul24(get_local_size(0), get_local_id(1))]

// This macro is only used the multithreadBodies (MT) versions of kernel code below
#define SX_SUM(i,j) sharedPos[i + mul24((uint)get_local_size(0), (uint)j)]    // i + blockDimx * j

then the SX_SUM macro has the explict casts that, if applied to the SX macro above it,
allows the sample to run as expected.

One final note:

be wary of doing a

make clean

in one of the sample directories, as this appears to be
somewhat overzealous and ends up doing a

rm -f ../../..//shared/lib//*.a

as well as removing all of the sample’s local objects and binaries
and so you have to keep rebuilding libshrutil_x86_64.

Kevin M. Buckley

eScience Consultant
School of Engineering and Computer Science
Victoria University of Wellington
New Zealand