Errors when loading/unloading a module repeatedly I get CUDA_UNKNOWN_ERROR

I am loading and unloading a module repeatedly, and after a certain number of iterations, cuModuleLoad() results in CUDA_UNKNOWN_ERROR. It seems like the number of texture references in the device code determines the number of successful iterations; for example, if I declare four textures (all on the form texture<float, 2, cudaReadModeElementType> tex), I get an error after 32 iterations. If I reduce the number of textures to three, I get an error after 42 iterations. The corresponding number of successful iterations is 64 for two textures and 128 for one texture. If I don’t declare any textures at all in my .cu-file, no error is produced. Changing the number of global functions in the .cu-file doesn’t seem to have any effect on the execution.

This is my code:

CUdevice cuDevice;
CUcontext cuContext;
CUmodule cuModule;
char* module_path;


CUresult status = cuCtxCreate( &cuContext, 0, cuDevice );        
module_path = cutFindFilePath("simpleTexture_kernel.cubin", argv[0]);    

bool firstError = true;

for(int i = 0; i < 1000; i++)
    char* memfname = (char*) malloc(200*sizeof(char));
    unsigned int freemem;
    unsigned int totmem;
    CUresult getmeminfo = cuMemGetInfo(&freemem, &totmem);
    sprintf(memfname, "%d free mem %.1f MB out of total mem %.1f MB.txt",i,freemem/1000000.0f, totmem/1000000.0f);                     
    int getmeminfoi = (int) getmeminfo;
    cutWriteFilei(memfname, &getmeminfoi, 1);

    status = cuModuleLoad(&cuModule, module_path);

        printf("%d OK when loaded module: %d \n", i, status);
    else if(firstError)
        printf("%d first error while loading module: %d \n", i, status);
        firstError = false;          
    status = cuModuleUnload(cuModule);
        printf("%d ok when unloaded module: %d \n", i, status);
    else if(firstError)
        printf("%d first error while unloading module: %d \n", i, status);
        firstError = false;          

    cuModule = 0;
CUT_EXIT(argc, argv);

According to the Cuda Programming guide, “if the memory for functions and data (constant and global) needed by the module cannot be allocated, cuModuleLoad() fails”, but I thought I was freeing the memory used by the module when unloading it?
As you can see I have checked the amount of memory available for allocation by the Cuda context (using cuMemGetInfo), and I seem to have plenty of memory left even when the loading of the module fails. Am I mixing up memory spaces?

I would be very grateful if somebody could explain why this error arises. I am new to Cuda and there are probably plenty of things about the driver API that I haven’t understood yet.

I have a similar problem. After a number of repeated load/unload operations I get a CUDA_OUT_OF_MEMORY error. Don’t know if it is the totally related to the problem described above but it seems likely. In my case the number of iterations lies around 5000.

Which OS are you using?
Which driver version?
Which GPU?

Please provide a test app which reproduces the problem.

Windows XP 64 (will test on our Win XP 32)
CUDA 1.1
GeForce 9800 GX2

Test app will follow.

Does this reproduce with the 2.0-beta release?