Dynamic parallelism in Mex interfaced MATLAB

Great to contact NVDIA

I have some issues with compiling cuda file written in MEX-interface for MATLAB.
Dynamic Parallelism was giving some problem previously in C while running at stampede. But got it resolved by the following way.

nvcc -arch=sm_20 -Xcompiler ‘-fPIC’ -dc test1.cu test2.cu
nvcc -arch=sm_20 -Xcompiler ‘-fPIC’ -dlink test1.o test2.o -o link.o
g++ -shared -o test.so test1.o test2.o link.o -L/usr/local/cuda/lib64 -lcudart
g++ -c main.cpp
g++ -o testmain main.o test.so

But now i am trying to parallelize some code using MATLAB using the MEX interface. Compiling and running of the mex-code was fine with the following command before adding the
functionality of DYNAMIC PARALLELISM of recursively calling a kernel till some condition.

Previous compilation code:
function nvmex(cuFileName)
%NVMEX Compiles and links a CUDA file for MATLAB usage
%NVMEX(FILENAME) will create a MEX-File (also with the name FILENAME) by
%invoking the CUDA compiler, nvcc, and then linking with the MEX

CUDA_LIB_Location = ‘/usr/local/cuda-5.0/lib64’;
Host_Compiler_Location = ‘’;
PIC_Option = ’ --compiler-options -fPIC -arch=sm_35 -rdc=true’;

[~, filename] = fileparts(cuFileName);
nvccCommandLine = [ 'nvcc --compile ’ cuFileName ’ ’ Host_Compiler_Location ’ ’
’ -o ’ filename ‘.o ’ PIC_Option ’ -I’ matlabroot '/extern/include ’ ];

mexCommandLine = [‘mex (’’’ filename ‘.o’’, ‘’-L’ CUDA_LIB_Location ‘’’, ‘’-lcudart’’)’];
status = system(nvccCommandLine);
if status < 0
error ‘Error invoking nvcc’;


which when i compile gets transformed to:

nvcc --compile cbsearch.cu -o cbsearch.o --compiler-options -fPIC -arch=sm_35 -rdc=true -I/usr/local/MATLAB/R2012a/extern/include mex (‘cbsearch.o’, ‘-L/usr/local/cuda-5.0/lib64’, ‘-lcudart’)

But after adding dynamic parallelism to my cuda code, i get the following error:


cbsearch.o: In function __sti____cudaRegisterAll_43_tmpxft_00007743_00000000_6_cbsearch_cpp1_ii_51e07f2f()': tmpxft_00007743_00000000-3_cbsearch.cudafe1.cpp:(.text+0x1d0b): undefined reference to __cudaRegisterLinkedBinary_43_tmpxft_00007743_00000000_6_cbsearch_cpp1_ii_51e07f2f’

collect2: ld returned 1 exit status
mex: link of ’ “cbsearch.mexa64”’ failed.

Can you please help me as to how to resolve the problem since matlab is an interpreter and how to enable dynamic parallelism in Mex interfaced MATLAB.

Link with -lcudadevrt

I believe it is needed for any code that includes dynamic parallelism.

Hi Vacaloca…Thanks for that reply. Even after linking with -lcudadevrt i still get the same error.
To state clearly…

nvcc --compile cbsearch.cu -o cbsearch.o --compiler-options -fPIC -arch=sm_35 -O3 -use_fast_math -rdc=true -lcudadevrt -I/usr/local/MATLAB/R2012a/extern/include
mex (‘cbsearch.o’, ‘-L/usr/local/cuda-5.0/lib64’, ‘-lcudart -lcudadevrt’)

cbsearch.o: In function __sti____cudaRegisterAll_43_tmpxft_00004e50_00000000_6_cbsearch_cpp1_ii_51e07f2f()': tmpxft_00004e50_00000000-3_cbsearch.cudafe1.cpp:(.text+0x16): undefined reference to __cudaRegisterLinkedBinary_43_tmpxft_00004e50_00000000_6_cbsearch_cpp1_ii_51e07f2f’
collect2: ld returned 1 exit status

mex: link of ’ “cbsearch.mexa64”’ failed.

I think you need to include someother directive.
Please anyone who has used mex with dynamic parallelism respond to this !!


Try adding the same -arch=sm_35 to your LDFLAGS definition and see if that helps

Once again thanks for the reply …

I couldnt make out what is LDFLAGS for my code …you can see the above compiler options(function nvmex()) and let me know where i gotta to make changes …

You don’t explicitly define the environment variable LDFLAGS in your nvmex code. You can do it inline or with a makefile. A makefile example is posted on my previous link. This is an example of how to define it inline:


Try adding this to your mex command line:

LDFLAGS="\$LDFLAGS -arch=sm_35"

What that does is preserves any existing LDFLAGS, and adds the extra arguments you should need to the linker.

To see what value(s) LDFLAGS already contains, if any, under Linux/Unix bash shell, simply type:

Note that sometimes MATLAB redefines CFLAGS, LDFLAGS, LD_LIBRARY_PATH, etc. variables willy-nilly, which is why I prefer compiling outside of MATLAB with a makefile.

Hi Vacaloca !!

You seem to be an expert in MEX interfaced matlab as you are giving many wayout. I being a novice to MEX took the script nvmex.m from the matlab repository and tried to extract max from Kepler K20.

As you said i included LDFLAGS in my mex command line, surprising enough to get the same error. Might be some syntax error of how we added the extra commands. To give you a picture of it…


!nvcc --compile cbsearch.cu -o cbsearch.o --compiler-options -fPIC -arch=sm_35 -O3 -rdc=true -lcudadevrt -I/usr/local/MATLAB/R2012a/extern/include -I …
mex cbsearch.o -L/usr/local/cuda-5.0/lib64 -lcudart LDFLAGS="$LDFLAGS -arch=sm_35" -lcudadevrt

Warning: No source files in argument list. Assuming C source
code for linking purposes. To override this
assumption use ‘-fortran’ or ‘-cxx’.

cbsearch.o: In function __sti____cudaRegisterAll_43_tmpxft_000078f4_00000000_6_cbsearch_cpp1_ii_51e07f2f()': tmpxft_000078f4_00000000-3_cbsearch.cudafe1.cpp:(.text+0x16): undefined reference to __cudaRegisterLinkedBinary_43_tmpxft_000078f4_00000000_6_cbsearch_cpp1_ii_51e07f2f’
collect2: ld returned 1 exit status

mex: link of ’ “cbsearch.mexa64”’ failed.

Can you please give us a step by step overview of how do you usually compile a mex cuda code in matlab with some random file name with all the above additions. So that we will be able to appreciate what you are telling and solve the issue at hand. Might be we are going wrong with the syntax of the command.

Thank you for all your help…

We came across the following link which we thought could be useful

But when we tried to adopt this to our mex file , dlink.o obj file is not getting created

We go the error

"dlink.o does not exist or is not a normal file "

I took a look at your first post again and realize that your code is in multiple files? Or was that just an example?

The extent of what I tried with compiling mex code had a single .cu file with standard mex/cuda includes and thrust. Do you have multiple source files?

If you can post a simple skeleton of your code using dynamic parallelism, I will see if I can get it compiled as a mex under Linux, and upload the corresponding makefile/etc. I have to do the same myself at some point anyway, might as well jump through the hoops now.

If it is just one source file, try adapting the CUDA source/makefile for the cdpSimplePrint CUDA sample to compile as a mex file. That should be pretty straight forward and you can later add your real code once it’s working.