Invalid Device Pointer

Hi,

I’m a new member in this group. Currently I’m using CUDA programming for my video codec project. I’m stuck with a problem. Here is the description…

a.cpp

extern “c” funcAlloc(…);
extern “c” funcCalc(…);
extern “c” funcDealloc(…);

func_a1()
{
funcAlloc(…);

}

func_a2()
{
funcCalc(…);
}

function_a3()
{
funcDealloc(…);
}

b.cu

unsigned char *d_var;

extern “c” funcAlloc(…)
{
cudaMalloc((void**)&d_var,…);
}

extern “c” funcCalc(…)
{
// allocate host memory

// load host data to device
cudaMemcpy(d_var,h_var,…);

// launch kernel
}

extern “c” funcDealloc(…)
{
cudaFree(d_var);
}

I get " Invalid device pointer" error .
My idea of declaring the variable ‘d_var’ as global is right???

The reason I’m doing this is because I need to allocate only once and overwrite frame data each time instead of allocating memory for every frame and deallocating each time.

Should I maintain a header .h file to declare these variables and include the header file in .cu file? I tried doing this one too. But it gives the same error.

I suppose if I allocate once n reuse the memory for every video frame data without deallocating it each time, the overall processing time will be reduced.

Can nyone help me out with this problem? Can u suggest me sme ideas for reusing the memory? It would be very helpful.

Thanks
Saritha

It looks right, at first glance. Can you paste the compiler error? Thanks.

And yes, you want to malloc as little as possible. An example - if your frame sizes were variable, I’d stick with allocating something big enough to house the big frames and still use that for small frames, in the interest of not malloc/freeing.

I do not understand why you have all the cuda* memory functions in a .cu file. All these functions can be called from regular C or C++ code, but this is beside the point.

I think you should check the return value from the cudaMalloc function to see if it succeeded and indeed you got a device pointer back from it.

There is no compile time error.

I came across the error “Invalid device pointer” after cudaMemcpy stmt. I printed the error using cuda error functions.

printf(" My CUDA Error :%s\n",cudaGetErrorString(cudaGetLastError()));

Thanks

Saritha

I tried to print the return value for cudaMalloc function . It says " no error"

When I tried to print the return value for cudaMemcpy function, it says “invalid device pointer”. Then I printed the address stored in the variable before cuda memcpy stmt. It is the same address as the one printed after cuda malloc stmt.

Thanks

Saritha

In order to determine which function is really failing, try surrounding all your cuda calls with CUDA_SAFE_CALL( ) and compile with dbg=1.

I surrounded all the cuda calls with CUDA_SAFE_CALL() and executed my code in debug mode.

It throws the following error in the output window in visual studio.

First-chance exception at 0x7c812a5b in Mp2Encoder.exe: Microsoft C++ exception: cudaError_enum at memory location 0x13b7f4ec…

I think there is some device memory issue. Is my way of declaring d_var right??. I printed the pointer address after cudaMalloc and before cudaMemcpy function call. Both results in same address.

Thank you for the reply

Saritha

I’ve got the same problem: I’d like to allocate memory only once in one function and update the data each frame in another function. Therefore I’d like to declare the pointer to the device memory as “global”, so I can access it in many functions. Something like:

*.cu file:

[codebox]

unsigned char* g_pReferenceDeviceMem = 0;

unsigned char* g_pCurrentFrameDeviceMem = 0;

global void processOnDevice(unsigned char* current, unsigned char* reference, int count)

{

int idx = blockIdx.x*blockDim.x + threadIdx.x;

if (idx < count)

{

    int value = current[idx] - reference[idx];

current[idx] = (value < 0) ? 0 : (unsigned char) value;

}

}

//do this for each frame (will be called very often)

extern “C” void processFrame(const unsigned char* pSrc, unsigned char* pDest, int count)

{

size_t size = count * sizeof(unsigned char); //count == size

cudaMemcpy(g_pCurrentFrameDeviceMem, pSrc, size, cudaMemcpyHostToDevice);

int blockSize = 128;

int numBlocks = (int)ceil(count / (float)blockSize);

processOnDevice <<< numBlocks, blockSize >>> (g_pCurrentFrameDeviceMem, pReferenceDeviceMem, count);

cudaMemcpy(pDest, g_pCurrentFrameDeviceMem, size, cudaMemcpyDeviceToHost);

}

//set reference image (will be called rarely)

extern “C” void setReferenceImage(const unsigned char* pReference, size_t size)

{

cudaMemcpy(g_pReferenceDeviceMem, pReference, size, cudaMemcpyHostToDevice);

}

//will be called only once for device memory allocation

extern “C” void init(const unsigned char* pReference, size_t size)

{

cudaMalloc((void**) &g_pCurrentFrameDeviceMem, size);

cudaMalloc((void**) &g_pReferenceDeviceMem, size);

setReferenceImage(pReference, size);

}

//free device memory resources

extern “C” void releaseResources()

{

cudaFree(g_pCurrentFrameDeviceMem);

cudaFree(g_pReferenceDeviceMem);

}

[/codebox]

I always get an “invalid device pointer” error message in processFrame for the first cudaMemcpy call. Is the declaration of my global pointers correct? Can they actually be used in several methods? Can the nvcc handle this?

There is a typo in the code

[codebox]

main(){

int i;

unsigned char ref[1024];

unsigned char src[1024];

unsigned char dest[1024];

if (cuInit(0) != CUDA_SUCCESS){

printf("cuInit failed, aborting ...\n");

exit(1);

}

CUdevice device;

int dev= 0;

CUcontext ctx;

if (cuDeviceGet(&device, dev) != CUDA_SUCCESS){

printf("Could not get device %d, aborting\n", dev);

exit(1);

}

if (cuCtxCreate(&ctx, CU_CTX_SCHED_AUTO, device) != CUDA_SUCCESS){

printf("Creating a context with devID %u failed, aborting\n", dev);

}

memset(ref, 0, sizeof(ref));

memset(dest, 0, sizeof(dest));

for (i =0;i < 1024;i ++){

src[i]= 1;

}

init(ref, 1024);

processFrame(src, dest, 1024);

processFrame(src, dest, 1024);

processFrame(src, dest, 1024);

processFrame(src, dest, 1024);

for(i=0;i < 1024;i++){

printf("dest[%d]=%d\n", i, dest[i]);

}

}

[/codebox]

You’re right. In the meantime I’ve written a small OpenCV test application, which uses exactly the same CUDA code like the bigger project, I’ve received the error for. At least I know, that the “invalid device pointer” problem is not because of my CUDA code, since the small OpenCV test app works fine for me. Ok, maybe I’m messing up a pointer anywhere in my C++ code. Anyway, thanks for having a look at my code!