Instruction 'tcgen05.alloc' not supported on .target 'sm_110'

Hi,

When compiling the following code on Thor:

__global__ void tcgen05_dummy(const int num_iters) {

    __shared__ uint32_t smem[1024];

    asm volatile(
      "tcgen05.alloc.cta_group::2.sync.aligned.shared::cta.b32 [%0], %1;"
      :
      : "r"(smem[0]), "n"(TMEM_COLUMNS));
}

NVCC reports the following errors:

error   : Instruction 'tcgen05.alloc' not supported on .target 'sm_110'
error   : Feature '.cta_group::2' not supported on .target 'sm_110'

CUDA Version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_Dec_16_07:27:17_PM_PST_2025
Cuda compilation tools, release 13.1, V13.1.115
Build cuda_13.1.r13.1/compiler.37061995_0

Compilation command:

nvcc -arch=sm_110a -O2 tcgen05_dummy.cu

The tcgen05 instructions are supported on Thor, right?

-

Thanks.

There is a clue in that you seem to be compiling for sm_110a but the error is indicating sm_110

The operation requires sm_110a (i.e. sm_110 won’t be sufficient), and/but the -arch switch has some possibly unexpected behavior in terms of what it means/actually compiles for.

Try compiling your code with -gencode arch=compute_110a,code=sm_110a instead of your -arch switch, and I think you’ll have a successful compile. If you wish to also embed PTX in the fatbinary, add an appropriate -gencode switch to do that, also.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.