Hey Folks,
I am working on a proof of concept that will hopefully provide excellent gains by using GPUs for image analysis. I’m finding some strange behavior that doesn’t seem to make sense to me.
I want to allocate device memory once and reuse it for each image that I want to analyze to avoid memory allocation time each time an image needs analyzing. What I have working is equivalent to this:
MyClass.h
class MyClass
{
public:
MyClass();
~MyClass();
int AnalyzeImage(unsigned char * image);
private:
unsigned char * m_ucDevImage;
};
MyClass.cpp
#include "GPUAnalyze.h"
MyClass::MyClass()
{
GPUInit(&m_ucDevImage);
}
MyClass::~MyClass()
{
GPUDestroy(m_ucDevImage);
}
int MyClass::AnalyzeImage(unsigned char * image)
{
GPUAnalyze(image,m_ucDevImage);
}
GPUAnalyze.h
void GPUInit(unsigned char ** img);
void GPUDestroy(unsigned char * img);
int GPUAnalyze(unsigned char * img, unsigned char * devImg);
GPUAnalyze.cu
void GPUInit(unsigned char ** img)
{
cudaMalloc((void**)img,1024*1024);
}
void GPUDestroy(unsigned char * img)
{
cudaFree(img);
}
int GPUAnalyze(unsigned char * img, unsigned char * devImg)
{
if ((cudaMemcpy(devImg,img,IMG_SIZE,cudaMemcpyHostToDevice)) != cudaSuccess)
{
printf("Unable to copy image to device\n");
CUDAErrorDetails(cudaGetLastError());
}
// analyze image here
return 0;
}
The first time GPUAnalyze is called, the cudaMemcpy works fine. All subsequent calls fail with cudaErrorInvalidDevicePointer. If I use a global unsigned char * in place of the MyClass::m_ucDevImage member, the system works fine. The value of the MyClass::m_ucDevImage member does not appear to change in between each of the calls to GPUAnalyze(). Any thoughts are appreciated. Thanks a bunch!
-Bryan