I encountered a strange problem. I am compiling my cuda application as a static library so that other application can use it. So I created several functions to allocate and access the device memory. Basically, my code look like this:
float *result_dev;
extern “C” {
void GPUinit()
{
…
cudaError_t s=cudaMalloc((void**)&result_dev, size);
cudaMemset((void*) result_dev, 16, size);
float *hostbuffer=(float *) malloc(size);
s=cudaMemcpy(hostbuffer, result_dev, size, cudaMemcpyDeviceToHost); // runs successfully
// save hostbuffer to file
…
}
readResult()
{
float *hostbuffer=(float *) malloc(size);
s=cudaMemcpy(hostbuffer, result_dev, size, cudaMemcpyDeviceToHost); // cudaMemcpy error invalid device pointer
if (s != cudaSuccess)
{
printf(“cudaMemcpy error %s\n”, cudaGetErrorString(s));
}
//save hostbuffer to file
}
}
When I save the hostbuffer from the GPUinit(), I can see the result correctly. However, when I call readResult(), the cudaMemcpy returns error code that translate to invalid device pointer.
I am wondering if anyone else get into this issue. Whether it is the fact that I compile it to static library and link against my application, or it is related to the CUDA release (I believe I am using 2.0).