Multiple streams on 1 GPU and out of memory error

I’m running multiple streams across multiple GPUs. I have more streams (dozens) than devices (4). I assign the streams to the devices round robin style. The streams run concurrently on the GPUs as seen in the nvidia-smi output (I do not have the visual profiler). However, the devices run out of memory after running about half the streams successfully. The streams are all the same just with different input data. Each stream allocs and frees device memory. I do not attempt to communicate between streams. All the streams are created by a single ‘main’ function. My questions are:

  1. Can I have a memory leak between streams on a given device?
  2. If memory leaks are possible between streams how can I use compute-sanitizer to find them?
  3. Can more than one stream run on a single device?
  4. If more than one stream can run on a single device how can a prevent CUDA from overwhelming the GPU memory with streams that dynamically alloc and free memory?

Thanks, Roger


Welcome to the NVIDIA Developer forums! This is the community feedback category, and is not monitored by technical staff. I am going to move this topic to the Compute Sanitizer category for better visibility.

Tom K

You can use the --leakcheck option to detect memory leaks with the compute-sanitizer tool.