I’m trying to do “sync all threads in block with same x”, like this
#include<stdio.h>
__global__ void ker() {
asm("barrier.cta.sync %0,%1;"::"r"(threadIdx.x), "r"(blockDim.y*32));
}
int main() {
ker <<< 1, dim3(2, 2, 32)>>>();
cudaDeviceSynchronize();
printf("%s", cudaGetErrorString(cudaGetLastError()));
}
But it keeps reporting an illegal instruction was encountered unless all instructions converge (i.e. blockDim.x=1).
According to doc, since I didn’t use the acronym bar., I didn’t put .aligned, and I’m not targeting sm_6x or below, it should allow diverge? I don’t understand what’s wrong