Hi,
We are doing some performance benchmarks on Jetson Xavier hardware, in order to decide if this solution is good enough for us.
During our work we used CUFFT and CUBLAS libraries and also implemented convolution kernel.
we have some questions:
1)Can we get the sources of the CUFFT library in order to change it to suit our needs?
2)Is there a convolution function in the CUBLAS library, or do you have an implementation of highly optimized convolution kernel?
3)Is there a scalar matrix by matrix multiplication in the CUBLAS library, or some other highly optimized kernel implementation ?
Thanks in advance