Using CMake with Matlab GPU Coder generated CUDA code

Hey, I created simple function to return cos of an array in matlab. I generated this function in GPU Coder an got cuda code. I have also succesfully generated and executed executable with that function generated in GPU coder on Jetson nano. But when I try to generate same functions with same main file with cmake, executable is succesfully generated, but it returns array of zeros. However both GPU coder and CMake executables work, when the matlab functions input is scalar, instead of and array (1x100 in my case).
This is matlab function, I want CUDA from.
image
This is the generated CUDA function:
image
This is how I call it:
image
image
When I generate this on jetson with cmake with cuda 10.2. installed, output array is 0.
When I generate this on Windows PC with cmake, output array is correct.
I managed to run it on jetson in debug mode.


Variables in first breakpoint:
image
image
In breakpoint on the line 45 variables are:
image
image
image
Then after calling cudaMalloc variables are
image
image
image

Hi @kubaixixel, I’m not familiar with using MATLAB GPU Coder, so maybe someone from the community can share their experiences. Or you may also want to contact MATLAB support.

One thing you can try, is adding a printf inside your CUDA kernel to log the values and see if they are being generated.

Another thing you can do, is add error checking to the CUDA API calls such as cudaMalloc() and cudaMemcpy(). You can also call cudaGetLastError() after your kernel. Here are some example error checking macros:

https://github.com/dusty-nv/jetson-utils/blob/1f3709f48258c2d75500c35605e8f6f4a3447afc/cuda/cudaUtility.h

Thanks, I will have a look at that. Also forgot to add that when I generate the executable directly in Matlab GPU coder, vector input works, so I have to think that it has something to do with building the executable. This is my CMakeLists.txt:

@kubaixixel if you are on Jetson Nano, you should be using -gencode arch=compute_53,code=sm_53 in your CMakeLists.txt. My guess is the CUDA kernel was failing to launch because it was compiled for the wrong GPU arch, but you didn’t see this failure because there was no error checking.

For reference, you can enable list all of these if you want to compile it for all Jetson devices:

-gencode arch=compute_53,code=sm_53
-gencode arch=compute_62,code=sm_62
-gencode arch=compute_72,code=sm_72

@dusty_nv thank you, that solves it.