xavier performance benchmarks


We are doing some performance benchmarks on Jetson Xavier hardware, in order to decide if this solution is good enough for us.

During our work we used CUFFT and CUBLAS libraries and also implemented convolution kernel.

we have some questions:

1)Can we get the sources of the CUFFT library in order to change it to suit our needs?

2)Is there a convolution function in the CUBLAS library, or do you have an implementation of highly optimized convolution kernel?

3)Is there a scalar matrix by matrix multiplication in the CUBLAS library, or some other highly optimized kernel implementation ?

Thanks in advance


1) Sorry that CUDA library is not open source.
You should be able to implement your application with our flexible API.
Please check our document here: https://docs.nvidia.com/cuda/cufft/index.html

2) No. But you can check this one:

3) YES. It is GEMM.
Xavier supports half-precision (FP16) and integer (INT8) matrix multiplication operations, which can give you higher performance.