Error 13 from cublas Sgemv() call using 680m, but no error when using Tesla K20

CudaaduC · September 24, 2013, 10:38pm

Have a CUDA mex file which I am accessing from MATLAB which works fine when I am using a K20, but when the same code on a different laptop(compiled with compute 3.0 instead of 3.5) I get the infamous INTERNAL_ERROR from a cublas Sgemv() call.

I searched already the forums and understand in general terms when this error shows up, but in this case the code works fine on the 3.5 machine, but crashes with a 3.0.

The code gets through all the memory allocations, and at least 10-15 Sgemm() and Strsm() before it exits from a Sgemv() call.

Every GPU command in the code checks for errors, and it works fine on the K20, but not the 680 in the laptop.

This error seems very general, so what are some possible causes in this case, which would show up only with the 680 but not the K20?

The 680 is the only GPU in the laptop running W7, Visual Studio 2010 and Matlab 2011b.

njuffa · September 25, 2013, 2:23am

Unless I am looking at the wrong header file, error 13 is EXECUTION_FAILED. This means the GPU kernel inside the CUBLAS call failed. Likely causes: unspecified launch failure, or timeout (kernel killed by watchdog timer).

Based on your description, I would think you are hitting the latter. You could try smaller matrix sizes to confirm, or use a GPU that is not running the display (the operating system watchdog timer is there to ensure the GUI doesn’t freeze for more than a specified time limit, usually 2-5 seconds).

The K20 never drives a display, so there is no watchdog timer associated with it and you can therefore run compute kernels of any duration you desire.

CudaaduC · October 7, 2013, 6:39pm

Still having this issue, even though I did adjust the watchdog timer in the registry.

The data size is not super large , the matrix A is 768 x 128, but still get the same error on the Sgemv() call.

This a link to the exact laptop machine which is causing the issue:

[url]MSI Laptop GT Series GT70 0NE-416US Intel Core i7 3rd Gen 3610QM (2.30GHz) 12GB Memory 500GB HDD 128 GB SSD NVIDIA GeForce GTX 680M 17.3" Windows 7 Home Premium 64-Bit - Newegg.com

When I adjust the mex interface to just do a single Sgemv() it works fine, and this particular code runs without error from MATLAB on two other different desktops with a Tesla K20 using the TCC driver.

So I am assuming there may be some system wide interrupt or interference to the call, since there is only that one GPU and which also has the active video out.

What else can I do to narrow down the problem?

njuffa · October 9, 2013, 11:37pm

Generally speaking, I would triple check the pointers and other arguments passed into cublasSgemv(). If the kernel does not die due to a watchdog timeout, it is probably being killed by an unspecified launch failure which indicates operating on memory that is out of bounds. This could be due to a bad pointer, an incorrect transpose mode, or an inadvertent switch of the dimensions.

The other angle of attack is the fact that it works fine with a single call to cublasSgemv(). So does it work with 2, 3, …, n calls? If not, what is the smallest n for which it does not work? What are the salient differences between running with n-1 and n calls? For example, are the matrices passed in different calls all the same size? I would cut the failing case down to a minimum of code and it will likely become apparent what the issue is.

Correct me if I am wrong, but GTX 680 is an sm_30 device, while Tesla K20 is an sm_35 device. This means that the kernels invoked by cublasSgemv() are physically different for the two platforms. They should be functionally equivalent though, unless there is a bug. I would consider a bug in reasonably common CUBLAS functions unlikely at this stage, but of course the possibility can never be excluded. Therefore I would focus initial investigation on the validity of the inputs passed into cublasSgemv().

Topic		Replies	Views
sgemm - crashing at 1024x1024 CUDA Programming and Performance	7	6021	July 24, 2009
CUBLAS_STATUS_MAPPING_ERROR when retrieving result after cublasSgemm CUDA Programming and Performance	6	3305	February 25, 2011
CGEMM problems CUDA Programming and Performance	14	6635	February 2, 2011
Cublas_status_execution_failed GPU-Accelerated Libraries	2	10674	February 23, 2021
cublasSgemm() alway fail during compute intensify task CUDA Programming and Performance	14	4555	January 8, 2015
cublas cgemm bug? inspecting a simple call to cublas cgemm with cuda-memcheck crashes CUDA Programming and Performance	2	2209	November 22, 2011
Cublas sgemm pointer error? Query re error in output of matrix multiplication. CUDA Programming and Performance	5	3401	February 18, 2010
cublasZgemm fails on FERMI but not on TESLA CUBLAS_STATUS_NOT_INITIALIZED even if 'cublasInit()& CUDA Programming and Performance	2	5906	February 17, 2011
cublasSgemm results in null matrix CUDA Programming and Performance	5	758	May 28, 2019
GPUmat not working with K20c GPU CUDA Setup and Installation	1	1572	April 23, 2013

Error 13 from cublas Sgemv() call using 680m, but no error when using Tesla K20

Related topics