Is bf16TensorCoreGemm in CUDA 11.2 sample known unable to compile?

I tried to compile the bf16TensorCoreGemm CUDA 11.2 sample (located in 0_Simple folder) using make. Following are the error messages:

>>> GCC Version is greater or equal to 5.0.0 <<<
//usr/lib/cuda/bin/nvcc -ccbin g++ -I…/…/common/inc -m64 --std=c++11 --threads 0 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o bf16TensorCoreGemm.o -c
g++: internal compiler error: Segmentation fault signal terminated program cc1plus
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
make: *** [Makefile:357: bf16TensorCoreGemm.o] Error 255

Is it known to be unable to compile this sample in CUDA 11.2? Do I have any chance to fix this issue on my side? How much will it cost me to fix it?


  • A remote Linux machine, I am not a root user
  • GPU driver 460.32.03
  • CUDA 11.2
  • gcc version: (Ubuntu 10.3.0-1ubuntu1~20.10) 10.3.0

It should pass . But your error looks like not emitted from nvcc , “g++: internal compiler error: Segmentation fault signal terminated program cc1plus” , looks like not enough memory . For your reference , linux - make -j 8 g++: internal compiler error: Killed (program cc1plus) - Stack Overflow

I did not use j8. Just make. VERBOSE=1 did not give any information.
Can you compile on your machine? If you can, how many memory do your machine have? Mine is 32GB.

I can see the similar issue when using gcc 10.3 , see Installation Guide Linux :: CUDA Toolkit Documentation for CUDA 11.2 , it is possible that gcc10 is not formally supported .
I can see it passes on gcc9.3 . Please try to use gcc-9 see if that works for you or try to update to latest CUDA versions .

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.