What is the purpose that use asynchronous CUDA APIs

Aeroman2333 · February 14, 2022, 8:24am

According to the CUDA documentation, CUDA has provided a mechanism that makes calling thread yield when waiting for results from GPU. With that mechanism and synchronous CUDA APIs, the CPU time is saved and the programming seems much easier.
So why should we use asynchronous CUDA APIs?

striker159 · February 14, 2022, 8:45am

What asynchronous APIs and thread yielding mechanisms are you referring to?

Aeroman2333 · February 14, 2022, 8:49am

This from the documentation of CUDA Driver API

CU_CTX_SCHED_YIELD: Instruct CUDA to yield its thread when waiting for results from the GPU. This can increase latency when waiting for the GPU, but can increase the performance of CPU threads performing work in parallel with the GPU.

striker159 · February 14, 2022, 8:57am

There is no direct relation between synchronization policy and whether or not an API call is synchronous.

Aeroman2333 · February 14, 2022, 9:06am

I’m not questioning about synchronization policy between kernels or host and GPU.
Since using synchronous API calls may not cost much CPU time, why we should use asynchronous API calls rather than synchronous API calls

striker159 · February 14, 2022, 9:17am

Asynchronous calls allow overlapping memory transfers and compute kernel, and hiding latency. See https://developer.nvidia.com/blog/how-overlap-data-transfers-cuda-cc/

What do you mean by “Since using synchronous API calls may not cost much CPU time”? I am not sure I can completely follow your arguments. CU_CTX_SCHED_YIELD does not improve performance of API calls. It can allow better CPU utilization (by other threads)

Aeroman2333 · February 14, 2022, 1:42pm

That’s what I’d like to do.

That means that using synchronous API calls can block the thread but release the CPU. From the perspective of CPU utilization, I think synchronous API and asynchronous API show little difference.

That’s what I overlooked but it answers my question.

However, there comes another question(Should I open another topic?)
What are the pros and cons of the two paradigms depicted by the figure below

striker159 · February 14, 2022, 2:53pm

I assume that in Paradigm 2 the host function will be executed serially after Synchronize Stream. Pros and cons depend on the use-case. What does your image source say about the two options?

Aeroman2333 · February 15, 2022, 6:47am

There are no any specific use-cases. All I want is a programming paradigm for better CPU utilization of heavily multithreaded system

Topic		Replies	Views
Multiple threads calling CUDA API in parallel CUDA Programming and Performance cuda , driver , parallel-computing	4	131	August 9, 2024
Cuda context and cudaDeviceSynchronize CUDA Programming and Performance	1	608	February 27, 2023
CUDA vs. SLI Any performance difference? CUDA Programming and Performance	5	6303	May 19, 2009
A few new to CUDA questions CUDA Programming and Performance	3	1110	February 4, 2011
CUDA async tutorial? How to use streams, events and so CUDA Programming and Performance	5	3547	September 14, 2008
Using CUDA/CudaContexts simultanously from multiple CPU threads CUDA Programming and Performance	4	5433	February 3, 2010
Is CUDA thread-safe? CUDA Programming and Performance	3	12436	February 18, 2008
some cuda question CUDA Programming and Performance	6	980	December 23, 2015
CUDA is slower than expected. Is something missing? CUDA Programming and Performance cuda , gpu , gpu-computing , parallel-computing	4	152	July 7, 2024
Too much time for kernel launch latency CUDA Programming and Performance	9	2246	November 28, 2022

What is the purpose that use asynchronous CUDA APIs

Related topics