How to share a data array in GPU memory between two mex functions?


Can anyone suggest how to retain data in GPU memory across two mexcuda functions? The basic steps followed are shown below:

 1. Initialize data array 'A' using MATLAB
 2. Transfer data to GPU memory using mexcuda function #1 and compute sum of array elements --> CUDA Kernel 1
 3. Call mexcuda function#2 to compute the square of all elements of array 'A' (which is already stored in GPU memory) --> CUDA Kernel 2


How to share the data between mexcuda function #1 and #2 so that the step 3 does not need transfer of data array ‘A’ again to GPU memory?

P.S: For simplification purposes, the array ‘A’ is initialized to 10 elements. Actually I have an array with >1000 elements as well as additional computations.

Thanks in advance

data array A[0] = 0.5;
data array A[1] = 1.5;
data array A[2] = 2.5;
data array A[3] = 3.5;
data array A[4] = 4.5;
data array A[5] = 5.5;
data array A[6] = 6.5;
data array A[7] = 7.5;
data array A[8] = 8.5;
data array A[9] = 9.5;
data array A[10] = 10.5;