How to setup Device memory limit on matlab?

I want to write my code with MATLAB, but put the computational intensive part on CUDA. So I need to first compile my kernel code to ptx, then call MATLAB function parallel.gpu.CUDAKernel() to run my CUDA kernel code. Inside the kernel, I have to dynamically allocated memory in device function.
However, “A default heap of eight megabytes is allocated if any program uses malloc() without explicitly specifying the heap size.”
So in the C++ environment I have to call this host function to increase the memory limit on device.
cudaDeviceSetLimit(cudaLimitMallocHeapSize, size_t size)
But this function is not allowed to call in kernel. So is there any other way to setup this limit?

use the mexcuda interface instead of the PTX interface in matlab

Thank you. I will try this.

just write a mex function in C, it gives you more flexibility, without any degradation in speed. here is something what I wrote, the cuda kernel are defined in, and linked with this mex file.