How to share a data array in GPU memory between two mex functions?

Hi,

Can anyone suggest how to retain data in GPU memory across two mexcuda functions? The basic steps followed are shown below:

 1. Initialize data array 'A' using MATLAB
 2. Transfer data to GPU memory using mexcuda function #1 and compute sum of array elements --> CUDA Kernel 1
 3. Call mexcuda function#2 to compute the square of all elements of array 'A' (which is already stored in GPU memory) --> CUDA Kernel 2

Clarification:

How to share the data between mexcuda function #1 and #2 so that the step 3 does not need transfer of data array ‘A’ again to GPU memory?

P.S: For simplification purposes, the array ‘A’ is initialized to 10 elements. Actually I have an array with >1000 elements as well as additional computations.

Thanks in advance

=====================
data array A[0] = 0.5;
data array A[1] = 1.5;
data array A[2] = 2.5;
data array A[3] = 3.5;
data array A[4] = 4.5;
data array A[5] = 5.5;
data array A[6] = 6.5;
data array A[7] = 7.5;
data array A[8] = 8.5;
data array A[9] = 9.5;
data array A[10] = 10.5;