how to retain value in GPU for future computations..

ash03e608 · January 13, 2009, 2:36am

hi

I’m trying a program on optimizing video processing where I apply a fft filter to each frame of the video. I am using the cudaFFt to compute the filtered image inside the GPU. As of now i’m passin the filter mask and frame to the GPU for processing repeating this for every frame in the video. Is it possible to retain the filter mask in the GPU so that i don have to pass it each time as its common for all the frames of the video.

the filter mask is basically an float array of size(640X480).

Sarnath · January 13, 2009, 4:55am

Global memory contents are persistent across kernel launches. So, you can just keep them lying around in global memory and re-use it.

Make sure you dont “cudaFree” that memory in your application. That would do.

ash03e608 · January 14, 2009, 1:26am

hi

u had replied recently to my question regardin retaining the variable on the GPU… basically im working from matlab… callin the cuda program through a mex function. Now to separate the filter and the video load into GPU … i’ve written two mex files “filter.cu” and “video.cu”.

I tried declarin it as global and access it in the second file however matlab crashes everytime i run the program… Ive pasted some code snippets below… I’m basically a masters student from chennai… and am new to CUDa… so any help would be really appreciated…

In “filter.cu” I’ve programmed as :

device float *input_mask;

void mexFunction( int nlhs, mxArray *plhs,

              int nrhs, const mxArray *prhs[])

{

double *output_dm ;

/* Pointer to the mask image */

output_dm =  mxGetPr(prhs[0]);

cudaMalloc( (void **) &input_mask,sizeof(float));

cudaMemcpy( input_mask,output_dm, sizeof(float), cudaMemcpyHostToDevice);

In the “video.cu”

extern device float *input_mask;

global void apply_mask(cufftComplex* output_float,

                     float* input_mask,

                     int Ntot)

{

const int idx = blockIdx.x * blockDim.x + threadIdx.x;

if ( idx < Ntot )

{

 output_float[idx].x = output_float[idx].x * input_mask[idx];

 output_float[idx].y = output_float[idx].y * input_mask[idx];

 }

}

Mack_White · January 14, 2009, 1:36am

I’ve struggled quite a bit with CUDA/Matlab interaction. You might want to check a bit if it is not your mex C-code which is not causing Matlab to crash. This having to do with how you allocate your resources and whether or not you overrun your memory limits. That’s how I crash Matlab frequently.

My thoughts on your issue about persistent masks is this: you know your mask ahead of time and I assume your mask is a constant throughout the operation, correct? So if you know it ahead of time, or at least compute the mask at the beginning you should store the matrix as a 640x480 in a CUDA array and bind it to a texture. Then, if you can return the pointer as a mex variable into your Matlab engine as a variable, you can pass it back to the next mex call and then typecast it as (void*) to get rid of whatever numerical format it was forced into by Matlab. That should allow you to remember your mask over multiple mex calls. Finally, when you’re done you should probably have another mex function which cleans up your GPU memory, again using that pointer.

I would be very interested to know how this works so please reply if things work out.

hi

u had replied recently to my question regardin retaining the variable on the GPU… basically im working from matlab… callin the cuda program through a mex function. Now to separate the filter and the video load into GPU … i’ve written two mex files “filter.cu” and “video.cu”.

I tried declarin it as global and access it in the second file however matlab crashes everytime i run the program… Ive pasted some code snippets below… I’m basically a masters student from chennai… and am new to CUDa… so any help would be really appreciated…

In “filter.cu” I’ve programmed as :

device float *input_mask;

void mexFunction( int nlhs, mxArray *plhs,
              int nrhs, const mxArray *prhs[])
{

double *output_dm ;

/* Pointer to the mask image */
output_dm =  mxGetPr(prhs[0]);
cudaMalloc( (void **) &input_mask,sizeof(float));

cudaMemcpy( input_mask,output_dm, sizeof(float), cudaMemcpyHostToDevice);

In the “video.cu”

extern device float *input_mask;

global void apply_mask(cufftComplex* output_float,
                     float* input_mask,

                     int Ntot)
{
const int idx = blockIdx.x * blockDim.x + threadIdx.x;

if ( idx < Ntot )

{

 output_float[idx].x = output_float[idx].x * input_mask[idx];

 output_float[idx].y = output_float[idx].y * input_mask[idx];

 }
}