__nanosleep not working as expected

salmon.sh · April 14, 2022, 8:37am

Hello. I’m testing with the __nanosleep function.
I was expecting below code would return 1000ms but I get only about 0.069632ms as result.
My intention was to sleep kernel for about 1 second.

#include <stdio.h>

#define RUNTIME_API_CALL(apiFuncCall)                                          \
  do {                                                                         \
    cudaError_t _status = apiFuncCall;                                         \
    if (_status != cudaSuccess) {                                              \
      fprintf(stderr, "%s:%d: error: function %s failed with error %s.\n",     \
              __FILE__, __LINE__, #apiFuncCall, cudaGetErrorString(_status));  \
      exit(-1);                                                                \
    }                                                                          \
  } while (0)

__global__ void kernel() {
#if __CUDA_ARCH__ == 860
  __nanosleep(1000000000); // ls
#else
  printf(">>> __CUDA_ARCH__ != 860\n");
#endif
}

int main() {

  cudaEvent_t start, stop;
  RUNTIME_API_CALL(cudaEventCreate(&start));
  RUNTIME_API_CALL(cudaEventCreate(&stop));

  RUNTIME_API_CALL(cudaEventRecord(start));
  kernel<<<1, 1>>>();
  RUNTIME_API_CALL(cudaEventRecord(stop));
  RUNTIME_API_CALL(cudaEventSynchronize(stop));

  float duration;
  RUNTIME_API_CALL(cudaEventElapsedTime(&duration, start, stop));
  printf("Elapsed time: %fms\n", duration);

  return 0;
}

I used this command.

nvcc simple.cu -arch=native

How can I make kernel to sleep for 1 second?

RTX 3090
Driver Version: 510.47.03
CUDA Version: 11.6

Robert_Crovella · April 15, 2022, 2:05pm

My guess is that nanosleep may have an undocumented upper bound on the argument.

This seems to work for me:

$ cat t2004.cu
#include <stdio.h>

#define RUNTIME_API_CALL(apiFuncCall)                                          \
  do {                                                                         \
    cudaError_t _status = apiFuncCall;                                         \
    if (_status != cudaSuccess) {                                              \
      fprintf(stderr, "%s:%d: error: function %s failed with error %s.\n",     \
              __FILE__, __LINE__, #apiFuncCall, cudaGetErrorString(_status));  \
      exit(-1);                                                                \
    }                                                                          \
  } while (0)

__global__ void kernel() {
#if __CUDA_ARCH__ >= 700
  for (int i = 0; i < 1000; i++)
    __nanosleep(1000000U); // ls
#else
  printf(">>> __CUDA_ARCH__ !\n");
#endif
}

int main() {
  cudaEvent_t start, stop;
  RUNTIME_API_CALL(cudaEventCreate(&start));
  RUNTIME_API_CALL(cudaEventCreate(&stop));

  RUNTIME_API_CALL(cudaEventRecord(start));
  kernel<<<1, 1>>>();
  RUNTIME_API_CALL(cudaEventRecord(stop));
  RUNTIME_API_CALL(cudaEventSynchronize(stop));

  float duration;
  RUNTIME_API_CALL(cudaEventElapsedTime(&duration, start, stop));
  printf("Elapsed time: %fms\n", duration);

  return 0;
}
$ nvcc -arch=sm_70 -o t2004 t2004.cu
$ ./t2004
Elapsed time: 1048.487671ms
$

In the above example, if I pass arguments that are 1000000 or less to nanosleep, I get approximately expected timing. If I pass arguments that are 10000000 or greater, I don’t. So I guess there is a threshold of some sort between 1000000 and 10000000.

You might wish to file a bug. It’s possible this is a documentation issue, or there may be some other issue I am not aware of.

It seems probable that the only actual guarantee is that the actual sleep duration will be in the range [0, 2*t] where t is the argument. Given that, I couldn’t categorically state that any guarantees are violated, but the function behavior is curious around that threshold and I can’t explain it.

salmon.sh · April 15, 2022, 4:15pm

Oh yes I’ve tried the same for loop and got the same conclusion as yours.

Actually I’m going to use sleep for about 1ms, so in this case this won’t really matter.

Thank you for answering Robert!

Topic		Replies	Views
Reliability of __nanosleep function CUDA Programming and Performance cuda	2	1091	December 4, 2022
100% CPU use while waiting for kernel CUDA Programming and Performance	7	4641	July 10, 2008
Cuda slow performance after process sleep/wait CUDA Programming and Performance	1	1249	June 14, 2022
Strange Performance Issues Strange Performance Issues at the First Kernel Execution CUDA Programming and Performance	1	838	August 8, 2009
is kernel in stream 0 asynchronous? CUDA Programming and Performance	10	3713	April 23, 2011
Time measurement CUDA Programming and Performance	2	1176	September 13, 2009
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1786	November 4, 2016
100% CPU usage when running CUDA code CUDA Programming and Performance	5	4974	October 31, 2023
Run a million threads or blocks on a single kernel function, and still works. It supposed to be 512 at maximum, isn't it? CUDA Programming and Performance	4	1313	January 6, 2017
Timing of kernel getting more than a function that runs on only CPU why so...?? CUDA Programming and Performance	1	613	May 15, 2014

__nanosleep not working as expected

Related topics