OpenCL compiler ignores inline keyword

michalj · October 8, 2009, 8:17pm

It looks like the current NVIDIA OpenCL compiler completely ignores the ‘inline’ keyword. I have recently been investigating a large (~2x) performance drop of my OpenCL code on a Tesla C1060 device. It turned out that using a function in two places in a .cl file caused the OpenCL compiler not to inline it, even though it was declared with the ‘inline’ keyword. There have been no such problems with the CUDA version of the code. As a temporary workaround, I was forced to ‘inline’ the function manually [1], which is ugly and which I would like to avoid in the future.

Has anyone had similar experiences with their OpenCL code? Is there some other way to force the compiler to inline selected functions?

[1] [url=“http://gitorious.org/sailfish/sailfish/commit/cf4be78c01a36d2f6c506974aa82cdf2b28797a1”]http://gitorious.org/sailfish/sailfish/com...a82cdf2b28797a1[/url]

Simon_Green · October 9, 2009, 4:04pm

I’m pretty sure everything is inlined by the OpenCL compiler (as in CUDA), so I’m surprised this makes such a difference. If you can provide a full repro case we can file a bug.

michalj · October 11, 2009, 1:39pm

You can reproduce the problem using the publicly available code of my sailfish project, which is hosted at Gitorious. Here is a sample command line session illustrating the steps (run on a GTX 280):

$ git clone git://gitorious.org/sailfish/sailfish.git

$ ./lbm_ldc.py --benchmark --backend=opencl --lat_w=512 --lat_h=512

Using the "opencl" backend.

# iters mlups_avg mlups_curr

1000 532.18 532.18

2000 529.43 526.68

^C

$ git show cf4be78c01a36d2f6c506974aa82cdf2b28797a1 | patch -R

$ ./lbm_ldc.py --benchmark --backend=opencl --lat_w=512 --lat_h=512

Using the "opencl" backend.

# iters mlups_avg mlups_curr

1000 298.72 298.72

2000 299.75 300.78

The second column (mlups_avg, MLUPS = Million Lattice site Updates Per Second) shows the performance decrease. The patch to revert is the same as the one I linked to in my previous post (it manually inlines the getDist() function inside the caller).

Topic		Replies	Views
OpenCL on Linux woes Linux	6	2107	March 27, 2017
clcc - an NVIDIA OpenCL command line compiler CUDA Programming and Performance	8	10094	November 1, 2012
OpenCL example code doesn't compile (CUDA 6.0 + Ubuntu 12.04.5) CUDA Setup and Installation	9	7263	August 16, 2017
Problem with OpenCL routines CUDA Setup and Installation	2	2533	May 6, 2013
NVIDIA OpenCL SDK deployment so 90ies CUDA Setup and Installation	1	716	November 5, 2016
Significant speedup of OpenCL vs CUDA CUDA Programming and Performance	23	7770	February 12, 2022
Builtin rotate() of 64-bit integer broken with NVIDIA CUDA 7.0 driver CUDA Programming and Performance	6	2433	June 22, 2015
OpenCL - hmm... not so interesting What is your take on it? CUDA Programming and Performance	16	13328	February 20, 2009
Double precision support CUDA Programming and Performance	5	4424	September 10, 2009
Issue getting CUDA benchmarks to compile after installing OpenCL on Fedora12 CUDA Programming and Performance	1	5091	November 15, 2010

OpenCL compiler ignores inline keyword

Related topics