Memory location persistance cudaMalloc and classes

Hey Folks,

I am working on a proof of concept that will hopefully provide excellent gains by using GPUs for image analysis. I’m finding some strange behavior that doesn’t seem to make sense to me.

I want to allocate device memory once and reuse it for each image that I want to analyze to avoid memory allocation time each time an image needs analyzing. What I have working is equivalent to this:

MyClass.h

class MyClass

{

public:

MyClass();

~MyClass();

int AnalyzeImage(unsigned char * image);

private:

unsigned char * m_ucDevImage;

};

MyClass.cpp

#include "GPUAnalyze.h"

MyClass::MyClass()

{

GPUInit(&m_ucDevImage);

}

MyClass::~MyClass()

{

GPUDestroy(m_ucDevImage);

}

int MyClass::AnalyzeImage(unsigned char * image)

{

GPUAnalyze(image,m_ucDevImage);

}

GPUAnalyze.h

void GPUInit(unsigned char ** img);

void GPUDestroy(unsigned char * img);

int GPUAnalyze(unsigned char * img, unsigned char * devImg);

GPUAnalyze.cu

void GPUInit(unsigned char ** img)

{

cudaMalloc((void**)img,1024*1024);

}

void GPUDestroy(unsigned char * img)

{

cudaFree(img);

}

int GPUAnalyze(unsigned char * img, unsigned char * devImg)

{

if ((cudaMemcpy(devImg,img,IMG_SIZE,cudaMemcpyHostToDevice)) != cudaSuccess)

{

printf("Unable to copy image to device\n");

CUDAErrorDetails(cudaGetLastError());

}

// analyze image here

return 0;

}

The first time GPUAnalyze is called, the cudaMemcpy works fine. All subsequent calls fail with cudaErrorInvalidDevicePointer. If I use a global unsigned char * in place of the MyClass::m_ucDevImage member, the system works fine. The value of the MyClass::m_ucDevImage member does not appear to change in between each of the calls to GPUAnalyze(). Any thoughts are appreciated. Thanks a bunch!

-Bryan

I used a similar approach in some other code of mine, therefore the concept should work. Do you make sure you check for the error status after the kernel completed?

Yep. I call cudaThreadSynchronize() and then check the return value of cudaGetLastError(). I’m currently working around it (hooray globals!). Thanks for the assurance that I’m not barking up the wrong tree.

-Bryan

One thing you might want to be carefull about is copying that class. In that case you would have two references to the same device memory. Deletion of the first will release the resources and break the other one.