Hi!
I’m new to CUDA programming and have some difficulties and do not know where they exactly come from.
I try to extend an image processing application with some CUDA accelerated algorithms. Each “filter” is
implemented as a derived class and compiled to a DLL.
When you select a filter for execution it will be created once and afterwards for each new processing
element (image) the ExecuteFilter() method will be called.
My idea was to do cudaMalloc once, when the class is created and cudaFree once when the class is
destroyed. Unfortunatly it doesn’t work for me. I get an “invalid device pointer” error, when cudaMemcpy
is called.
It only works when I call cudaMalloc/Free inside the ExecuteFilter() method, which is not very efficient,
since this method is called very often.
I’ve already looked for solutions in this forum, but I didn’t exactly know the problem. In other threads
it was even suggested to do the cudaMalloc/Free inside the constructor and destructor of a class.
Can someone tell me how I might solve this problem?
Thanks in advance,
Andreas
I put some of the code online in order to better illustrate the problem.
SobelFilterCUDA::SobelFilterCUDA(STRING name){
setInputSize(1);
getInputData(0).setDataType<TIplImage*>("sobel_input");
setOutputSize(1);
sobel_output = new TIplImage();
getOutputData(0).setData(sobel_output,"sobel filtered image");
m_ClassName = name;
CUDA_SAFE_CALL(cudaMalloc((void**) &d_data, 8294400));
};
SobelFilterCUDA::~SobelFilterCUDA(){
CUDA_SAFE_CALL(cudaFree(d_data));
SAFE_DELETE(sobel_output);
}
void SobelFilterCUDA::ExecuteFilter() {
TIplImage* input = dynamic_cast<TIplImage*>((getInputData(0).getData()));
input->formatCheck(3);
char* data = input->image()->imageData;
int mem_size = input->image()->imageSize;
CUDA_SAFE_CALL(cudaMemcpy(d_data, data, mem_size, cudaMemcpyHostToDevice));
CUDA_SAFE_CALL(cudaMemcpy(data, d_data, mem_size, cudaMemcpyDeviceToHost));
getOutputData(0).setData(input);
}