Save index of maximum value with cublas

There’s not enough information to debug, but you really should be using blocks with at least 64 threads. I highly suggest you take the CUDA C++ DLI Courses – NVIDIA