cudaLaunchCooperativeKernel and syncthreads

jomivaan · June 29, 2024, 11:01am

Hello,
I am trying to create a program that needs both grid and block synchronization. However, the intrinsic syncthreads()__ does not seem to work properly inside a cudaLaunchCooperativeKernel(). Is this a known problem? I am using Cuda 12.4. I also tried the cooperative groups for block sync but it did not work as well. My kernel launch uses 8 blocks of a 3080.
Thank you

striker159 · June 30, 2024, 5:17pm

Without showing a reproducer I highly doubt that __syncthreads() does not work correctly. Your problem is most likely caused by something else.

Topic		Replies	Views
Cooperative_groups::this_grid() is not valid on my Volta architecture GPU. How to globally synchronize CUDA Programming and Performance cuda	3	246	June 4, 2024
The result is unpredictable. CUDA Programming and Performance	6	1168	October 25, 2013
syncthreads() in loop why does this work? CUDA Programming and Performance	1	13778	August 1, 2008
The use of _syncthreads() CUDA Programming and Performance	2	639	October 2, 2019
is there any function to do sync threads in a grid? CUDA Programming and Performance	2	2500	March 30, 2015
cuda syncthreads fail CUDA Programming and Performance	7	3880	February 22, 2013
Can I use cooperative_groups::sync(grid) in child kernel (CUDA dynamic parallelism)? CUDA Programming and Performance	1	539	January 24, 2022
How can I be certain my Kernel runs with 32 threads in one block and thus perfect synchrony? (ie. via __syncthreads()) CUDA Programming and Performance	15	290	August 21, 2024
Semantics of __syncthreads CUDA Programming and Performance	18	18306	January 2, 2008
I was curious to ask why __syncthreads() dose not work when calling the kernel function in matlab. CUDA Programming and Performance	2	618	January 7, 2017

cudaLaunchCooperativeKernel and syncthreads

Related topics