CUcontext creation and destruction leads to handles leak How to create/destroy context in the worker

I’ve recently noticed that my app that creates worker thread for each installed GPU (with subsequent context creation/destruction inside the thread) eats more and more system handles (working under WinXP).

These are the steps the application performs hundreds of times:

  1. Create source data to be crunched on GPU

  2. Run as many worker threads as many GPUs installed

  3. Wait until all the data is handled

  4. Go to step (1)

Each worker thread works with CUDA like this:

CUdevice cuDevice;

cuDeviceGet(&cuDevice, nThreadNumber);

CUcontext cuContext;

cuCtxCreate(&cuContext, CU_CTX_SCHED_SPIN, cuDevice);

DoTheJob();

cuCtxDestroy(cuContext);

What I observe is that each cuCtxCreate (regardless following cuCtxDestroy) increases the active handles counter in the task manager.

How CUDA context creation/destruction must be actually done?

why do you destroy the context ? do you exit the program after each run ?

What driver are you using?

Wait, why are you using a thread pool and recreating contexts? What’s up with that? (Does not make sense!)

I thought that it is also possible to create the context each time the thread is started as well to destroy the context each time the thread is finished, looks like I was wrong.

Am I right that it is actually necessary to create a number of contexts (one per GPU) once the application is started and then assign them to worker threads ?

as far as i know you can do what you are saying, create a thread create a context do work and then destroy the context and the thread. But why do all that ? why not create all the worker threads and contexts when the program goes up. and then keep them idle until you have work to give them ? Any ways you have to make sure that you destroy the context from the correct thread.

It is easier for me (due to application architecture reasons) to create and destroy worker threads rather then managing them (keeping them idle, resuming them when needed e t c). So I decided to create the context for each worker thread from withing that thread itself and then to destroy the context when the thread is about to finish.

This works but eats handles continuously.

I’m pretty sure it’s a bug and you should be able to do it, I’ll try to find out what the status is of that today.

OK, thank you!

If this will help: Process explorer indicates constant growth of handles of semaphore objects and mutex objects.

It’s a bug, but I don’t have an ETA for the fix.

Thank you for the info, hopefully, the fix won’t take months :-)

No promises. (You should really just have a persistent thread pool anyway, it’s probably much faster)