Reliability of __nanosleep function

ajay_nayak · November 26, 2022, 5:02am

Hi,

I was trying to understand the CUDA __nanosleep function available for Volta and above architecture, which puts threads to sleep.

This gets lowered to PTX nanosleep. According to this documentation, this instruction provides sleep in the range of [0,2*t].

This becomes quite a large range, and h/w can completely ignore the instruction too? For example, when the sleep is 0 irrespective of the argument passed to the instruction.

I wanted to know if this is some discrepancy in the documentation. If not, how to reliably use this instruction given that the h/w can just not put the threads to sleep?

Robert_Crovella · November 26, 2022, 3:19pm

In the worst case, there is a large range. There is also no published indication of anything further characterizing the variability, AFAIK. I agree the function has questionable utility for situations that require some definition of exact timing.

In addition, the nanosleep function has a maximum requestable sleep value of ~1ms. I expect this particular notation to be present in the PTX docs of the next major CUDA release.

Depending on your interests, you might wish to explore the PTX special register globaltimer or the CUDA C++ clock64() function, to build your own delay. Yes, I’m aware that globaltimer also has wording that seems to discourage its use.

For modern, forward-looking usage, the best approach is probably to see if you can adapt the libcu++ chrono functionality to your needs.

ajay_nayak · December 4, 2022, 6:54am

The libcu++ chrono implementation, relies on the PTX globaltimer. It provides an abstraction for CUDA usage, which is great. The limitation of 1ms does not seem to be a problem here as well.
From the programmers perspective, I think the end result might be same, i.e., some approximate number of cycles a thread gets delayed. However, from the hardware-side, I believe (inferred from the documentation) __nanosleep provides much deeper functionalities. It suspends the thread in hardware, which might have other hardware implications.

Topic		Replies	Views
__nanosleep not working as expected CUDA Programming and Performance cuda	2	3430	April 15, 2022
100% CPU use while waiting for kernel CUDA Programming and Performance	7	4641	July 10, 2008
100% CPU usage when running CUDA code CUDA Programming and Performance	5	4974	October 31, 2023
Is there any timeout or lifetime in cuda ioctl? CUDA Programming and Performance cuda	6	887	July 12, 2023
Counting Cycles on GPU What is the Highest timer resolution possible? CUDA Programming and Performance	3	2342	August 13, 2009
sleep CUDA Programming and Performance	4	5896	April 19, 2012
Does %clock measure actual GPU cycles, or what? CUDA Programming and Performance	5	1590	July 9, 2019
need a help from employees or guys who know compiler well CUDA Programming and Performance	22	8618	December 18, 2008
Cuda slow performance after process sleep/wait CUDA Programming and Performance	1	1249	June 14, 2022
clock64() reversed CUDA Programming and Performance	2	1696	May 11, 2016

Reliability of __nanosleep function

Related topics