Inifinite loop with multi-stream shared data synchronisation starting with cuda 12.2 and nvidia driver version 535

OS : 22.04
kernel version : 5.15.0-79-generic
GPU : A100 40 & 80GB
CUDA version : 12.2

Hello there,

My goal is to communicate between multiple kernels, which are all in separate streams. This kind of implementation breaks the progress forward guaranty, but was still working on a finite amount of streams. I’m however having a quite big issue since I upgraded to CUDA 12.2, being that data sync between two stream is no longer possible anymore

Here is a basic example that will help to understand what I’m trying to achieve and what changed in the behavior in CUDA 12.2. The test is creating a given amount of streams that will wait indefinitely on a GPU buffer (int stop* in the sample) and then the last kernel in a concurrent stream will send a signal that will unlock every previous streams. If the GPU supports the number of concurrent streams the program should finish or will be stuck in an inifinite loop if the GPU has to put the last stream on wait.


#include <stdio.h>
#include <iostream>
#include <unistd.h>
#include <string>

__global__ void kernel(int i, volatile int *stop) {
        printf("Kernel %d going...\n", i);
        while(*stop == 0) {}

__global__ void kernel2(int i, volatile int *stop)
  *stop = i;

int main(int argc, char* argv[]) {
  int n = std::stoi(argv[1]);
  cudaStream_t stream[1000];
  int* stop;
  cudaMalloc(&stop, sizeof (int));
  cudaMemset(stop, 0, sizeof (int));
  for (int i = 0 ; i < n; i++) {

  for (int i = 0 ; i < n; i++) {
    kernel<<<1,1,0,stream[i]>>>(i, stop);
  kernel2<<<1, 1, 0, stream[n]>>>(1, stop);
  std::cout << "Synchronizing processes..." << std::endl;
  return 0;

To build

 nvcc -o basic_test

To test

./basic_test 4

Results :

With CUDA 12.1 the kernel terminates, hence working properly

With CUDA 12.2 (and more specifically with the driver 535.86.10) we are stuck in an inifinite loop.

I think that the behavior of CUDA 12.2 isn’t the one we expect, although playing with this kind of data dependency between separated streams is dangerous. Any help would be appreciated.


Absolutely related to the persistent kernel issue. Closing the topic. Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.