Hello,
I am running the matrixMul CUDA 12.1 sample in Windows on a 970M, if I use -wA=4096 -hA=4096 -wB=4096 -hB=4096 (to specify 4096x4096 matrices), the cudaStreamSynchronize fails to wait for the kernels to finish:
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “Maxwell” with compute capability 5.2
MatrixA(4096,4096), MatrixB(4096,4096)
Computing result using CUDA Kernel…
done
err 0
CUDA error at C:\work\Repos\cuda-samples\Samples\0_Introduction\matrixMul\matrixMul.cu:209 code=702(cudaErrorLaunchTimeout) “cudaStreamSynchronize(stream)”
(I’ve added a print getCudaLastError() right after launching the kernel, it’s successful (see the “err 0”))
With 2048x2048 matrices it runs correctly.
What can be the issue ? Thank you !