CUBLAS_STATUS_MAPPING_ERROR when retrieving result after cublasSgemm

gradstudent4fun · June 2, 2009, 7:45pm

In NVIDIA’s SDK project simpleCUBLAS, two NxN matricies are being multiplied.

I changed the size of N to be 4700. I have a GeForce 8600M GT w/ 512 MB of RAM…

When I run the program (simpleCUBLAS), memory for device Matricies A,B, and C are allocated correctly. cublasSgemm(…) is invoked and has a status of CUBLAS_STATUS_SUCCESS.

However, when I attempt to read the result back:

NVIDIA SDK Project - (C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA CUDA SDK\projects\simpleCUBLAS)

/* Read the result back */
status = cublasGetVector(n2, sizeof(h_C[0]), d_C, 1, h_C, 1);
if (status != CUBLAS_STATUS_SUCCESS) {
fprintf (stderr, “!!! device access error (read C)\n”);
return EXIT_FAILURE;

I get an error “!!! device access error (read C)”… When checking the error type, the actual error is CUBLAS_STATUS_MAPPING_ERROR…

I have had this same problem in my own code when attempting to retrieve a matrix multiply via cublasGetVector(…) and cublasGetMatrix(…).

I have also run into this same problem on a GeForce 9650M w/ 1 Gig of RAM - however, the matrix size NxN needs to be around 7000x7000 before yielding a CUBLAS_STATUS_MAPPING_ERROR…

How can cublas successfully allocate GPU memory, operate on that memory, but fail to retrieve the results back to the host?

avidday · June 2, 2009, 8:40pm

I am going to guess that you are hitting the driver watchdog timer limit. The CUBLAS sgemm kernel is being killed (and probably CUBLAS is losing its context) because it is taking too long to complete.

gradstudent4fun · June 2, 2009, 8:57pm

When I run this on Windows Vista, execution always quits after 5 seconds. However, in Windows XP the execution time for some matricies takes about 20 seconds…

What you are saying makes sense - in WinXP, when execution fails, it typically takes a smaller amount of time to fail and exit than to complete successfully on a smaller data set.

Is there a way to work around the watch dog timer? Right now, my video card is attached to a display (laptop).

Is the watch dog timer only active when the video card is attached to a display?

How else can i retrieve several hundred MBs of contiguous floating point data using only one device handle?

avidday · June 2, 2009, 9:09pm

The watchdog behaviour is OS specific and I only do linux (where you can get around it), so I can’t help you with that.

But to be clear, it isn’t the memory copy which is have problems, it is the sgemm call. You can fill the or read back the entire device memory of a 1Gb card in a few hundred milliseconds, which is never going to be problematic. But a single big monolithic cublas kernel can take a while and cause problems.

jph4599 · June 2, 2009, 9:42pm

From my understanding, yes. You should be able to get a second card (cheap, cuda capability NOT required) to use for display purposes and free up your CUDA card for processing and not have to worry about the watchdog.

I’m not sure what would happen if you physically unplugged your display before running CUDA but I doubt it would fix anything because the card would presumably still be the primary display device?

gradstudent4fun · June 7, 2009, 1:44am

Thanks for the input… To resolve this issue, I installed Ubuntu 9.10 w/ the CUDA 2.2 drivers… Prior to running my kernel, I invoked /etc/init.d/gdm stop (to turn off X)… This bypasses the whole watchdog issue…

The instructions for get CUDA working under Linux can be found here (this also works on Ubuntu 8.04, 8.10, and 9.10 w/ the standard desktop install (+ some dev packages)…

http://ubuntuforums.org/archive/index.php/t-1112317.html

wojtas · February 25, 2011, 9:31pm

Hi,

I am having the same problem - could you tell me how to get around it in linux? I wouldn’t like to stop gdm if possible ( but I am fine waiting several seconds with inresponsive X while the calculation is being processed)

thanks,

w

Topic		Replies	Views
sgemm - crashing at 1024x1024 CUDA Programming and Performance	7	6021	July 24, 2009
CUBLAS_STATUS_MAPPING_ERROR in cublasGetMatrix() after cublasDgemm() GPU-Accelerated Libraries	10	9282	February 21, 2013
cublasDgemm fails if executed repeatedly CUDA Programming and Performance	8	8395	August 19, 2008
cuBLAS call from kernel in CUDA 10.0 GPU-Accelerated Libraries	9	4846	April 7, 2021
Cublas Launch timeouts CUDA Programming and Performance	11	6846	March 19, 2009
cublasSgemm results in null matrix CUDA Programming and Performance	5	759	May 28, 2019
Is it correct that my Pascal card is calling Maxwell_gemm kernels through cublas? And if so, why is cublas unusably slow for me? CUDA Programming and Performance	6	942	August 23, 2018
Help with CUBLAS performance and timing issues, please help... CUDA Programming and Performance	1	3441	December 26, 2008
cuBLAS SGEMM randomly crashes when running multiple host threads sharing one cuBLAS handle. CUDA Programming and Performance	5	994	December 12, 2017
CUBLAS problem CUDA Programming and Performance	16	3518	July 1, 2010

CUBLAS_STATUS_MAPPING_ERROR when retrieving result after cublasSgemm

Related topics