I am loading and unloading a module repeatedly, and after a certain number of iterations, cuModuleLoad() results in CUDA_UNKNOWN_ERROR. It seems like the number of texture references in the device code determines the number of successful iterations; for example, if I declare four textures (all on the form texture<float, 2, cudaReadModeElementType> tex), I get an error after 32 iterations. If I reduce the number of textures to three, I get an error after 42 iterations. The corresponding number of successful iterations is 64 for two textures and 128 for one texture. If I don’t declare any textures at all in my .cu-file, no error is produced. Changing the number of global functions in the .cu-file doesn’t seem to have any effect on the execution.
This is my code:
CUdevice cuDevice;
CUcontext cuContext;
CUmodule cuModule;
char* module_path;
CUT_DEVICE_INIT_DRV(cuDevice);
CUresult status = cuCtxCreate( &cuContext, 0, cuDevice );
module_path = cutFindFilePath("simpleTexture_kernel.cubin", argv[0]);
bool firstError = true;
for(int i = 0; i < 1000; i++)
{
char* memfname = (char*) malloc(200*sizeof(char));
unsigned int freemem;
unsigned int totmem;
CUresult getmeminfo = cuMemGetInfo(&freemem, &totmem);
sprintf(memfname, "%d free mem %.1f MB out of total mem %.1f MB.txt",i,freemem/1000000.0f, totmem/1000000.0f);
int getmeminfoi = (int) getmeminfo;
cutWriteFilei(memfname, &getmeminfoi, 1);
free(memfname);
status = cuModuleLoad(&cuModule, module_path);
if(!status)
{
printf("%d OK when loaded module: %d \n", i, status);
}
else if(firstError)
{
printf("%d first error while loading module: %d \n", i, status);
firstError = false;
}
status = cuModuleUnload(cuModule);
if(!status)
{
printf("%d ok when unloaded module: %d \n", i, status);
}
else if(firstError)
{
printf("%d first error while unloading module: %d \n", i, status);
firstError = false;
}
cuModule = 0;
}
cutFree(module_path);
cuCtxDetach(cuContext);
CUT_EXIT(argc, argv);
According to the Cuda Programming guide, “if the memory for functions and data (constant and global) needed by the module cannot be allocated, cuModuleLoad() fails”, but I thought I was freeing the memory used by the module when unloading it?
As you can see I have checked the amount of memory available for allocation by the Cuda context (using cuMemGetInfo), and I seem to have plenty of memory left even when the loading of the module fails. Am I mixing up memory spaces?
I would be very grateful if somebody could explain why this error arises. I am new to Cuda and there are probably plenty of things about the driver API that I haven’t understood yet.