cuBLAS causing races on Maxwell (code to reproduce it included)

alexgg · April 28, 2015, 8:28am

#include <cstdlib>
#include <cstdio>
#include <cuda_runtime.h>
#include <cublas_v2.h>

int main(int argc, char** argv)
{
    int dev = 0;
    if(argc > 1)
        dev = std::atoi(argv[1]);
    std::printf("dev = %d\n", dev);

    cudaStream_t s;
    cublasHandle_t h;

    float *a, *b, *c;
    std::size_t n = 4000;

    cudaSetDevice(dev);

    cudaMalloc(&a, n * n * sizeof(float));
    cudaMalloc(&b, n * n * sizeof(float));
    cudaMalloc(&c, n * n * sizeof(float));

    cudaStreamCreate(&s);
    cublasCreate(&h);

    cublasSetStream(h, s);
    float alpha = 1, beta = 1;

    cublasSgemm(h, CUBLAS_OP_N, CUBLAS_OP_N, n, n, n, &alpha, a, n, b, n, &beta, c, n);

    cudaStreamSynchronize(s);
    cublasDestroy(h);
    cudaStreamDestroy(s);
}

To reproduce:

g++ -I/usr/local/cuda/include cublas_race.cpp -L/usr/local/cuda/lib64 -lcudart -lcublas
cuda-memcheck --tool racecheck ./a.out 0 # choose GTX 980 here

I tried this a few times on several GTX 980 cards, and one K40. GTX’s usually showed screenfuls of

========= Race reported between Read access at 0x00002570 in maxwell_sgemm_128x128_nn
=========     and Write access at 0x00002548 in maxwell_sgemm_128x128_nn [406 hazards]

but the number of errors varies each run, and sometimes it’s 0. K40 never showed any races.

This is on Ubuntu 14.04 with CUDA 6.5.

Questions:

Can you reproduce this? (So that I'm not the only one reporting the issue)
Is this a problem with cuBLAS or racecheck?
Has this been fixed in CUDA 7.0? I couldn't find any cuBLAS release notes online.

njuffa · April 28, 2015, 9:12am

Well, there is always someone who is the first to report an issue :-) NVIDIA has staff for reproducing customer-reported issues. Before you file a bug you would want to try CUDA 7.0 though to see whether the issue has already been addressed.

I have a hazy notion that this issue (or one very similar to this) was brought up in this forum before. I am not very confident in my recollection though, but it may have been a case where the race condition involves data that is not actually contributing to the final result (which does not mean the issue should not be addressed). Maybe txbob remembers whether this came up before and if so, what the details were.

alexgg · April 28, 2015, 4:28pm

The CUDA FAQ says posting in the forum is enough – no need to file bug reports.

FWIW, with the same parameters, the results of SGEMM always agree with OpenBLAS for me, so I don’t know if this is a false positive for racecheck, or the kind of data race that does not affect the result, or something else:

A more complex application I’m working on crashes under racecheck, but not without it. It runs fine with memcheck too.

scottgray · April 28, 2015, 4:56pm

Those race conditions are harmless and will not effect the output. At the end of the gemm op some warp synchronous shared memory shuffling is done in order to get coalesced writes back to global. This means that the region of memory being written to and subsequently read from is the same within a warp. Because the shared memory instructions are guaranteed to be executed in sass order, there is no need to synchronize them. The racecheck tool is not coded to check that the read and write addresses meet this condition and so it is reported.

So if anything, this is a bug in racecheck.

njuffa · April 28, 2015, 5:41pm

I think the FAQ is overly optimistic, to be generous. No idea who wrote the FAQ, but I do not think there is any guarantee that bug reports in the forums will be spotted by NVIDIA.

Robert_Crovella · April 29, 2015, 2:21pm

The CUDA FAQ page has now been edited. I believe it should now consistently present the idea that bugs are to be filed using the bug reporting system available to registered developers.

reporting bugs using these forums (only) is a less robust method.

BradNemire · April 29, 2015, 4:01pm

Hi Alex- It would be great if you can file the bug here https://developer.nvidia.com/nvbugs/cuda/add

Thanks!

alexgg · April 29, 2015, 7:50pm

Thanks for the explanation!

Topic		Replies	Views
race conditions inside thrust,cublas calls? CUDA Programming and Performance	6	2165	December 1, 2015
cuda race detector detecting race from sgemm_32x32x32_NN CUDA Programming and Performance	3	624	July 30, 2018
cuBLAS SGEMM randomly crashes when running multiple host threads sharing one cuBLAS handle. CUDA Programming and Performance	5	992	December 12, 2017
Is cuBLAS/cuDNN performance somehow reachable or reproducable with CUDA? CUDA Programming and Performance	7	3851	December 25, 2019
cuBLAS handle creation fails CUDA Programming and Performance	1	505	June 13, 2022
CUBLAS problem CUDA Programming and Performance	16	3517	July 1, 2010
[BUG][DEVICE LAUNCHED GRAPHS] CudaErrorInvalidValue with cuBLAS combined with the cudaGraphInstantiateFlagsDeviceLaunch starting with CUDA 12.1 CUDA Programming and Performance nvbugs	3	767	February 19, 2025
CUDA error when running matrixMulCUBLAS sample - Ubuntu 16.04 CUDA Setup and Installation	19	13423	May 4, 2018
cublasSgemm results in null matrix CUDA Programming and Performance	5	758	May 28, 2019
CUDA racecheck, shared memory array and cudaDeviceSynchronize() CUDA Programming and Performance	10	2769	February 5, 2013

cuBLAS causing races on Maxwell (code to reproduce it included)

Related topics