templated functions problem with CUDA

I have the following code (abstracted from the real thing). I can make it crash cudafe1 or give unresolved references, but can’t make it work.

// templated kernel
global void prim_kernel(float * dest, int n_elts,
float *a0, float *a1, T a2i);

global void prim_kernel(float * dest, int n_elts,
float *a0, float *a1, float a2i)

global void prim_kernel(float * dest, int n_elts,
float *a0, float *a1, float *a2)

// templated host side
void prim(float * d, float *a0, float *a1, T2 a2)
int n_elts = 1000000;
int block_size = 128;
int n_blocks = (n_elts / block_size) + ((n_elts % block_size) ? 1 : 0);
prim_kernel<<< n_blocks, block_size, 0>>>(d, n_elts, a0, a1, a2);

#ifdef USE_TEMPLATE // correct, but crashes cudafe1
template void prim(float *d, float *a0, float *a1, float *a2);
template void prim(float *d, float *a0, float *a1, float a2);
#else // gives unresolved externals because it generates non-templated versions
void prim(float *d, float *a0, float *a1, float *a2);
void prim(float *d, float *a0, float *a1, float a2);

float f;
prim(&f, &f, &f, &f);
prim(&f, &f, &f, f);

If I compile it as is, it crashes cudafe1:

C:\tmp>nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2007 NVIDIA Corporation
Built on Wed_Jul_16_12:57:50_PDT_2008
Cuda compilation tools, release 2.0, V0.2.1221

C:\tmp>nvcc -v cudabug.cu
# _HERE_=C:\CUDA\bin # THERE=C:\CUDA\bin
# TOP=C:\CUDA\bin/.. # PATH=C:\CUDA\bin/…/extools/bin;C:\CUDA\bin/…/open64/bin;C:\CUDA\bin/…/bin;
C:\CUDA\bin/…/lib;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\ID
E;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN;C:\Program Files\Mi
crosoft Visual Studio .NET 2003\Common7\Tools;C:\Program Files\Microsoft Visual
Studio .NET 2003\Common7\Tools\bin\prerelease;C:\Program Files\Microsoft Visual
Studio .NET 2003\Common7\Tools\bin;C:\Program Files\Microsoft Visual Studio .NET
erl\bin;c:\cygwin\bin;C:\Program Files\Intel\MKL\9.0\ia32\bin;C:\WINDOWS\system3
2;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files\HTMLDoc;C:\Program Files
Common Files\Adobe\AGL;C:\Program Files\Common Files\Avid;C:\Program Files\Ratio
nal\common;c:\Program Files\Debugging Tools for Windows;C:\Program Files\GNU\Gnu
PG\pub;C:\PROGRA~1\DISKEE~1\DISKEE~1;C:\Program Files\TortoiseSVN\bin;C:\Progra
m Files\QuickTime\QTSystem;C:\CUDA\bin;C:\Program Files\Bitvise Tunnelier;c:\Pr
ogram Files\PuTTY;C:\Program Files\Lua\5.1;C:\Program Files\Lua\5.1\clibs;C:\Pro
gram Files\GTK2-Runtime\lib;
# INCLUDES="-IC:\CUDA\bin/../include" "-IC:\CUDA\bin/../include/cudart" # LIBRARIES= “/LIBPATH:C:\CUDA\bin/…/lib” cudart.lib
# PTXAS_FLAGS= # VSINSTALLDIR=C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/…/…

#$ VCINSTALLDIR=C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/…/…

#$ C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/…/…/common7/Tool

C:\tmp>call C:\DOCUME~1\garyo\LOCALS~1\Temp/tmpxft_00001328_00000000-2.bat
Setting environment for using Microsoft Visual Studio .NET 2003 tools.
(If you have another version of Visual Studio or Visual C++ installed and wish
to use its tools from the command line, run vcvars32.bat for that version.)

C:\tmp>set 1>C:\DOCUME~1\garyo\LOCALS~1\Temp/tmpxft_00001328_00000000-1
# DevEnvDir=C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/../.. # INCLUDE=C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/…/…\VC7
ATLMFC\INCLUDE;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/…/…
VC7\INCLUDE;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/…/…\VC7
\PlatformSDK\include\prerelease;C:\Program Files\Microsoft Visual Studio .NET 20
03\VC7\BIN/…/…\VC7\PlatformSDK\include;C:\Program Files\Microsoft Visual Studi
o .NET 2003\SDK\v1.1\include;C:\Program Files\Microsoft Visual Studio .NET 2003
VC7\ATLMFC\INCLUDE;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\INCLUD
E;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\PlatformSDK\include\pre
release;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\PlatformSDK\inclu
de;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\include;C:\Progra
m Files\Intel\MKL\9.0\include;C:\Program Files\Microsoft Visual Studio .NET 2003
# LIB=C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/../..\VC7\ATLM FC\LIB;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/../..\VC7\LIB; C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/../..\VC7\PlatformSDK \lib\prerelease;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/../.. \VC7\PlatformSDK\lib;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1 \lib;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\ATLMFC\LIB;C:\Progra m Files\Microsoft Visual Studio .NET 2003\VC7\LIB;C:\Program Files\Microsoft Vis ual Studio .NET 2003\VC7\PlatformSDK\lib\prerelease;C:\Program Files\Microsoft V isual Studio .NET 2003\VC7\PlatformSDK\lib;C:\Program Files\Microsoft Visual Stu dio .NET 2003\SDK\v1.1\lib;C:\Program Files\Intel\MKL\9.0\ia32\lib;C:\Program Fi les\Microsoft Visual Studio .NET 2003\SDK\v1.1\Lib\ # MSVCDir=C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/…/…\VC7
# Path=C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/../..;C:\Prog ram Files\Microsoft Visual Studio .NET 2003\VC7\BIN/../..\VC7\BIN;C:\Program Fil es\Microsoft Visual Studio .NET 2003\VC7\BIN/../..\Common7\Tools;C:\Program File s\Microsoft Visual Studio .NET 2003\VC7\BIN/../..\Common7\Tools\bin\prerelease;C :\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN/../..\Common7\Tools\bi n;C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\bin;C:\WINDOWS\Mic rosoft.NET\Framework\v1.1.4322;C:\CUDA\bin/../extools/bin;C:\CUDA\bin/../open64/ bin;C:\CUDA\bin/../bin;C:\CUDA\bin/../lib;C:\Program Files\Microsoft Visual Stud io .NET 2003\Common7\IDE;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\ BIN;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\Tools;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\Tools\bin\prerelease;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\Tools\bin;C:\Program Files\Micro soft Visual Studio .NET 2003\SDK\v1.1\bin;C:\WINDOWS\Microsoft.NET\Framework\v1. 1.4322;c:\ruby\bin;C:\Perl\bin;c:\cygwin\bin;C:\Program Files\Intel\MKL\9.0\ia32 \bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files\HT MLDoc;C:\Program Files\Common Files\Adobe\AGL;C:\Program Files\Common Files\Avid ;C:\Program Files\Rational\common;c:\Program Files\Debugging Tools for Windows;C :\Program Files\GNU\GnuPG\pub;C:\PROGRA~1\DISKEE~1\DISKEE~1\;C:\Program Files\To rtoiseSVN\bin;C:\Program Files\QuickTime\QTSystem\;C:\CUDA\bin;C:\Program Files\ Bitvise Tunnelier;c:\Program Files\PuTTY;C:\Program Files\Lua\5.1;C:\Program Fil es\Lua\5.1\clibs;C:\Program Files\GTK2-Runtime\lib;; # PATH=C:/Program Files/Microsoft Visual Studio .NET 2003/VC7/BIN;C:\Program Fi
les\Microsoft Visual Studio .NET 2003\VC7\BIN/…/…;C:\Program Files\Microsoft V
isual Studio .NET 2003\VC7\BIN/…/…\VC7\BIN;C:\Program Files\Microsoft Visual S
tudio .NET 2003\VC7\BIN/…/…\Common7\Tools;C:\Program Files\Microsoft Visual St
udio .NET 2003\VC7\BIN/…/…\Common7\Tools\bin\prerelease;C:\Program Files\Micro
soft Visual Studio .NET 2003\VC7\BIN/…/…\Common7\Tools\bin;C:\Program Files\Mi
crosoft Visual Studio .NET 2003\SDK\v1.1\bin;C:\WINDOWS\Microsoft.NET\Framework
n;C:\CUDA\bin/…/lib;C:\Program Files\Microsoft Visual Studio .NET 2003\Common7
IDE;C:\Program Files\Microsoft Visual Studio .NET 2003\VC7\BIN;C:\Program Files
Microsoft Visual Studio .NET 2003\Common7\Tools;C:\Program Files\Microsoft Visua
l Studio .NET 2003\Common7\Tools\bin\prerelease;C:\Program Files\Microsoft Visua
l Studio .NET 2003\Common7\Tools\bin;C:\Program Files\Microsoft Visual Studio .N
ET 2003\SDK\v1.1\bin;C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322;c:\ruby\bin;C:
\Perl\bin;c:\cygwin\bin;C:\Program Files\Intel\MKL\9.0\ia32\bin;C:\WINDOWS\syste
m32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\Program Files\HTMLDoc;C:\Program File
s\Common Files\Adobe\AGL;C:\Program Files\Common Files\Avid;C:\Program Files\Rat
ional\common;c:\Program Files\Debugging Tools for Windows;C:\Program Files\GNU\G
nuPG\pub;C:\PROGRA~1\DISKEE~1\DISKEE~1;C:\Program Files\TortoiseSVN\bin;C:\Prog
ram Files\QuickTime\QTSystem;C:\CUDA\bin;C:\Program Files\Bitvise Tunnelier;c:
Program Files\PuTTY;C:\Program Files\Lua\5.1;C:\Program Files\Lua\5.1\clibs;C:\P
rogram Files\GTK2-Runtime\lib;;
# cl -D__CUDA_ARCH__=100 -nologo -E -TP -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_S M_11_ATOMIC_INTRINSICS -DCUDA_NO_SM_13_DOUBLE_INTRINSICS -DCUDA_NO_SM_12_ATOMIC_ INTRINSICS "-IC:\CUDA\bin/../include" "-IC:\CUDA\bin/../include/cudart" -I. - D__CUDACC__ -C -FI "cuda_runtime.h" > "C:\DOCUME~1\garyo\LOCALS~1\Temp/tmpxft_0 0001328_00000000-6_cudabug.cpp1.ii" "cudabug.cu" cudabug.cu # cudafe --m32 --microsoft_version=1310 --msvc_target_version=1310 --compiler_b
indir “C:/Program Files/Microsoft Visual Studio .NET 2003/VC7/BIN” --diag_error=
host_device_limited_call -tused --gen_c_file_name “C:\DOCUME~1\garyo\LOCALS~1\Te
mp/tmpxft_00001328_00000000-3_cudabug.cudafe1.c” --stub_file_name “C:\DOCUME~1\g
aryo\LOCALS~1\Temp/tmpxft_00001328_00000000-3_cudabug.cudafe1.stub.c” --stub_hea
der_file_name “C:\DOCUME~1\garyo\LOCALS~1\Temp/tmpxft_00001328_00000000-3_cudabu
g.cudafe1.stub.h” --gen_device_file_name "C:\DOCUME~1\garyo\LOCALS~1\Temp/tmpxft
00001328_00000000-3_cudabug.cudafe1.gpu" --include_file_name C:\DOCUME~1\garyo
LOCALS~1\Temp/tmpxft_00001328_00000000-5_cudabug.fatbin.c “C:\DOCUME~1\garyo\LOC
cudabug.cu(34): internal error: assertion failed: get_scope_for_routine_definiti
on: scope is NULL (D:/Bld/rel/gpgpu/toolkit/r2.0/compiler/edg/EDG_3.9/src/c_gen

be.c, line 8887)

1 catastrophic error detected in the compilation of “C:\DOCUME~1\garyo\LOCALS~1
Compilation aborted.

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application’s support team for more information.

–error 0x3 –


If I toggle the #define USE_TEMPLATE to turn off the template decls near the end, it compiles OK, but won’t link due to unresolved refs:

(just the tail of nvcc -v here, showing the invocation of cl to link):
#$ cl -D__CUDA_ARCH__=100 -nologo -Fe"a.exe" “C:\DOCUME~1\garyo\LOCALS~1\Temp/tm
pxft_00001548_00000000-13_cudabug.obj” -link -INCREMENTAL:NO “/LIBPATH:C:\CUDA
\bin/…/lib” cudart.lib
tmpxft_00001548_00000000-13_cudabug.obj : error LNK2019: unresolved external sym
bol “void __cdecl prim(float *,float *,float *,float)” (?prim@@YAXPAM00M@Z) refe
renced in function _main
tmpxft_00001548_00000000-13_cudabug.obj : error LNK2019: unresolved external sym
bol “void __cdecl prim(float *,float *,float *,float *)” (?prim@@YAXPAM000@Z) re
ferenced in function _main
a.exe : fatal error LNK1120: 2 unresolved externals

–error 0x2 –

The reason for this latter error is a name mangling issue. But I think the correct code is to use the “template” keyword anyway, and that’s the main problem – it crashes cudafe1. Is there any workaround? I need this kind of thing to work, and on a fairly large scale.

– Gary

Hi Gary!

I think you must place

template <>

in front of the specilizations. For example

template <>

__global__ void prim_kernel(float * dest, int n_elts, 

					float *a0, float *a1, float a2i)


So you can specilize for T=float and T=float*

But Im not sure if this will solve your problem :mellow:



Thanks for the advice, Navier-Stokes. But it makes no difference, cudafe1 still crashes in the same way. (Actually with your change it crashes cudafe1 whether I turn on or off the #define!)

Any other ideas anyone?

It looks like you’re using CUDA 2.0. The most recent version of CUDA, 2.2, features significantly improved support for C++ templates. Perhaps the latest version of cudafe will correctly compile your code.

yeah, doesn’t crash, but doesn’t compile either:

mjolnir:~ tim$ nvcc compiletest.cu 

compiletest.cu: In function ‘void prim(float*, float*, float*, T2)’:

compiletest.cu:21: error: ‘prim_kernel’ was not declared in this scope

compiletest.cu:21: error: expected primary-expression before ‘>’ token

compiletest.cu:21: error: expected primary-expression before ‘)’ token

(on my mac, running 2.2)

this is a known problem and will be fixed in the 2.3 release.

a known workaround is to give the prototype instantiation an implementation, ie

global void prim_kernel(float * dest, int n_elts,
float *a0, float *a1, T a2i) {}