tensor core sample

Andrey1984 · February 8, 2019, 10:04am

I am just wondering if there is a sample for use of tensor cores with xavier, as the below ddoesn’t seem to support aarch64.
https://github.com/NVIDIA/cuda-samples/tree/master/Samples/cudaTensorCoreGemm
Thanks

dusty_nv · February 8, 2019, 7:15pm

Hi Andrey, with a couple minor modifications to the Makefile, this WMMA sample builds and runs on Jetson AGX Xavier.

Comment out lines 250-253 of Makefile:

#ifeq ($(TARGET_ARCH),aarch64)
#  $(info >>> WARNING - cudaTensorCoreGemm is not supported on aarch64 - waiving sample <<<)
#  SAMPLE_ENABLED := 0
#endif

Change line 267 of Makefile to include support for compute_72 / sm_72:

# Gencode arguments
SMS ?= 70 72 75

Then build and run it:

$ make
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../Common  -m64    -maxrregcount=255 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cudaTensorCoreGemm.o -c cudaTensorCoreGemm.cu
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cudaTensorCoreGemm cudaTensorCoreGemm.o 
mkdir -p ../../bin/aarch64/linux/release
cp cudaTensorCoreGemm ../../bin/aarch64/linux/release

$ ./cudaTensorCoreGemm 
Initializing...
GPU Device 0: "Xavier" with compute capability 7.2

M: 4096 (16 x 256)
N: 4096 (16 x 256)
K: 4096 (16 x 256)
Preparing data for GPU...
Required shared memory size: 64 Kb
Computing... using high performance kernel compute_gemm 
Time: 54.112225 ms
TFLOPS: 2.54

Andrey1984 · February 8, 2019, 7:32pm

Dustin, thank you.
With sequentially executing [specifically in that order, not otherwise]

sudo nvpmodel -m 0
sudo ./jetson_clocks.sh

the performance noticeable increases

./cudaTensorCoreGemm 
Initializing...
GPU Device 0: "Xavier" with compute capability 7.2

M: 4096 (16 x 256)
N: 4096 (16 x 256)
K: 4096 (16 x 256)
Preparing data for GPU...
Required shared memory size: 64 Kb
Computing... using high performance kernel compute_gemm 
Time: 37.658943 ms
TFLOPS: 3.65

Topic		Replies	Views
Efficient use of TensorCore and cudaCore data on Xavier platform, not limited to API Jetson AGX Xavier	3	361	August 19, 2019
Jetson AGX Xavier GEMM benchmarks Jetson AGX Xavier performance	3	845	September 27, 2021
Jetson AGX slower than TX2 Jetson AGX Xavier	7	1018	May 22, 2019
GPU cuda cores or Tensor cores Jetson AGX Xavier cuda	2	972	October 18, 2021
running cuda, visionworks and all other samples on the tx2 for the tx2 Jetson TX2	2	903	October 18, 2021
CUDA/cuDNN/TensorRT version on Jetson AGX Xavier Jetson AGX Xavier	5	871	October 18, 2021
How to confirm whether Tensor Core is working or not. Jetson AGX Xavier	8	11210	October 18, 2021
Errors about building tensorrt oss on Xavier Jetson AGX Xavier	2	2693	October 18, 2021
Utilize Tensor Cores of GV100 (Titan V) in OpenCl CUDA Programming and Performance	1	777	January 5, 2018
TensorFlow on Jetson TX1 Jetson TX1	14	15999	January 20, 2017

tensor core sample

Related topics