As I have just come across this page, I thought I’d add some
info that may be useful for others that find it, as a couple
of the questions raised don;t have explicit answers here.
If you have the NVIDIA_GPU_Computing_SDK that @vacaloca and @txbob
refer to, ie
gpucomputingsdk_4.2.9_linux.run
then the “missing” exception.h file exists in three places
5035 Aug 16 14:26 ./shared/inc/exception.h
5035 Aug 16 14:26 ./CUDALibraries/common/inc/exception.h
5035 Aug 16 14:26 ./C/common/inc/exception.h
FWIW, we have CUDA Version 8.0.44 (wrapping GCC 5.4.0)
running on ArchLinux systems here and I could get, for example,
the oclMarchingCubes sample to build and run if I did the
following from the top-level of the SDK
(Might not actually need the first one, but for completeness)
cd C/common
make
cd ../../shared
make
BUILDS lib/libshrutil_x86_64.a
cd ../OpenCL/common/
make
BUILDS lib/liboclUtil_x86_64.a
cd ../src/oclMarchingCubes/
make verbose=1 2>&1 | tee /tmpmake.out
Where I did see an issue was when running say, the Nbody example,
where attempts to build the PTX file fail because of the complier
not being able to recognise the correct calling signature of parts
of some macros, giving one this error message at run time:
...
Build Log:
<kernel>:86:44: error: call to 'mul24' is ambiguous
accel = bodyBodyInteraction(accel, SX(i++), myPos, softeningSquared);
^~~~~~~
<kernel>:28:29: note: expanded from macro 'SX'
#define SX(i) sharedPos[i + mul24(get_local_size(0), get_local_id(1))]
The fix for this appears to be in the code, in that, if one looks at
oclNbodyKernel.cl
...
// Macros to simplify shared memory addressing
#define SX(i) sharedPos[i + mul24(get_local_size(0), get_local_id(1))]
// This macro is only used the multithreadBodies (MT) versions of kernel code below
#define SX_SUM(i,j) sharedPos[i + mul24((uint)get_local_size(0), (uint)j)] // i + blockDimx * j
...
then the SX_SUM macro has the explict casts that, if applied to the SX macro above it,
allows the sample to run as expected.
One final note:
be wary of doing a
make clean
in one of the sample directories, as this appears to be
somewhat overzealous and ends up doing a
rm -f ../../..//shared/lib//*.a
as well as removing all of the sample’s local objects and binaries
and so you have to keep rebuilding libshrutil_x86_64.
Kevin M. Buckley
eScience Consultant
School of Engineering and Computer Science
Victoria University of Wellington
New Zealand