how to using optimized general matrix multiplication(GEMM) & memory re-ordering operations such as im2col

how to using optimized general matrix multiplication(GEMM) & memory re-ordering operations such as im2col
do you have any idea?

Hi,

Please check our CUDA sample for information:


/usr/local/cuda-9.0/samples/0_Simple/matrixMul/
/usr/local/cuda-9.0/samples/0_Simple/matrixMulCUBLAS/
/usr/local/cuda-9.0/samples/0_Simple/matrixMulDrv/

Thanks.