cublas SEGFAULT in cublasInit() cublas SEGFAULT in cublasInit() but locally compiled examples run.

Lon8lunch · January 14, 2010, 1:08am

When I try to make any call in cublas including cublasInit(), I get a segfault.
I’m using gcc/g++ 4.4.0 on RedHat 5.4.
I can compile and run all the SDK examples with g++44. I can even run this code (the code that segfaults) when I call it from a different executable. The nefarious code is just a simple matrix multiply, but it doesn’t even matter what it is I never make it past cublasInit(). Its located in a C++ class which is compiled into a shared library (so) along with hundreds of others. The executable links in this library (so).

When I run it linked with the emulator version (cublasemu.so) it works fine. When I run it linked into a different executable (our unit tests) it works fine. Run time api calls work. Its runs in one executable but not the other. Has Anybody else seen this happen? Are there any compile/link flags that interfere with cublas at runtime?

Here is the core back trace:
(gdb) bt
[i]#0 0x00002aaaab957980 in ?? () from /usr/lib64/libcuda.so
#1 0x00002aaaab95d3c4 in ?? () from /usr/lib64/libcuda.so
#2 0x00002aaaab92d557 in ?? () from /usr/lib64/libcuda.so
#3 0x00002aaaab8d8cf7 in ?? () from /usr/lib64/libcuda.so
#4 0x00002aaaab8ea52b in ?? () from /usr/lib64/libcuda.so
#5 0x00002aaaab8cf940 in ?? () from /usr/lib64/libcuda.so
#6 0x00002aaaab8c8a8a in ?? () from /usr/lib64/libcuda.so
#7 0x00002aaaab923187 in ?? () from /usr/lib64/libcuda.so
#8 0x00002b63ea71beb2 in ?? () from /opt/brs/lib/libcudart.so.2
#9 0x00002b63ea71c69c in ?? () from /opt/brs/lib/libcudart.so.2
#10 0x00002b63ea70081d in cudaFree () from /opt/brs/lib/libcudart.so.2
#11 0x00002b63ea93d110 in cublasInitCtx () from /opt/brs/lib/libcublas.so.2
#12 0x00002b63ea9871f7 in ?? () from /opt/brs/lib/libcublas.so.2
#13 0x00002b63ea93d2b0 in cublasInit () from /opt/brs/lib/libcublas.so.2
#14 0x00002b63e452bd63 in brs::util::CudaTestClass::simpleSinglePrecisionMatrixMutlitp
y (this=)
at …/…/brsVAE/src/vaelib/util/CudaTestClass.cpp:92
#15 0x00002b63e553314b in thread_proxy () from /opt/brs/lib/libboost_thread-gcc41-mt-1_39.so.1.39.0
#16 0x00000032bd6064a7 in start_thread () from /lib64/libpthread.so.0
#17 0x00000032bcad3c2d in clone () from /lib64/libc.so.6

avidday · January 14, 2010, 6:45am

gcc 4.4 isn’t supported. That looks suspiciously like a runtime code incompatibility. Try using gcc-4.3 or earlier instead (I think the gcc-4.4 install is officailly a “preview” version in Redhat – there is still a gcc-4.1 version available as the mainline compiler)

Lon8lunch · January 14, 2010, 3:39pm

Yeah 4.4.0 is a preview. I rolled back to gcc/g++ 4.1.2 and I still get the segfault.

Can you think of anything else?

Is there source available for cublas 2.0 or at least a shared lib with debug symbols?

avidday · January 14, 2010, 3:44pm

There is no source available for the modern cublas I am afraid. Am I right in thinking you have wrapped up cublas in some sort of C++ class of wrapper function? This is a wild guess, but It might well be that you need to use plain C malloc rather than the C++ new operator for allocating host side storage. I personally have never seen anything like this and I have used cublas pretty exensively in my own codes.

Lon8lunch · January 15, 2010, 4:48pm

I don’t make it far enough to pass a variable. I can remove everything but cublasInit() and it still fails.

So this would segfault when I call test() in side our exe.

class CudaTest

{

public:

    void test() { cublasInit(); }

}

avidday · January 15, 2010, 9:48pm

What driver and toolkit versions are you using?

Lon8lunch · January 15, 2010, 11:34pm

Tool Kit 2.3

Driver 190.53

Lon8lunch · January 16, 2010, 12:10am

Are there any know conflict with other libraries like Intel Performance Primitives (IPP) or intel math kernel library (mkl) or OMP

I gave up on cublas for a while and I am writing my own matrix multiply. Now am I am running into the same kind of problem with cudaMalloc(). The first time I ever call it in our main exe I get a segfault. If I call the same routine from our unit test exe it works. If I run the main exe in the cuda-gdb I get cudaErrorNoDevice as a return from malloc but no segfault.

For a while I was getting a back-trace that had showed the segfault was in libiomp which has something to do with OMP in mkl. Now I am back to the segafault in cudaMalloc().

Still Frustrated.

avidday · January 16, 2010, 7:43am

The toolkit and SDK versions you have are OK. Sometimes the sort of symptoms you are seeing can be caused by newer toolkit versions on very old drivers. CUDA coexists with MKL to the best of my knowledge.

Are you sure that the actual driver installation is ok? Can you build and run the deviceQuery example from the SDK, for example? It might be that the driver you have either doesn’t have libcuda.so with it or it is hosed somehow. Is the driver from the NVIDIA installer or a third party rpm repackage?

Lon8lunch · January 18, 2010, 4:55pm

Friday I bought a newer card (Geforce GTS 250) and resinstalled the latest NVidia Driver 190.53, with no effect . I can build and run all the SDK examples (much faster now) even with g++ 4.4.0. The SDK and tools are not repackaged. They are from the nvidia cuda download area (2.3).

I did find one interesting thing. When I change the env var OMP_NUM_THREADS (used with MKL) I can move the segfault location around. If OMP_NUM_THREADS=1 we segfault in the cuda lib trying to do a cudaMalloc (cuda.so). if its > 1 we segfault in libiomp5 (part of MKL). I am working on removing the MKL calls from our exe to see if that makes a difference.

Lon8lunch · January 19, 2010, 12:32am

Removing MKL didn’t help. But I did manage to get past this problem. If I call cublasInit() very early in my main() function then everything works fine. Our stuff is heavily multri-threaded so I have been making all my calls to cublas or cuda from inside one of these threads that don’t launch until after few seconds into a run. It always segfaulted there. And I only have to call init there. I can call other cublas from anywhere.

I am not sure why this happened. Maybe its got something to do with how the device gets mapped into memory by the exe. Myabe it was in the docs and I missed it. I have no idea. But its working now.

avidday · January 19, 2010, 7:30am

The device context that the runtime API/CUBLAS establishes on the GPU is thread specific. If you want many threads to be able to use the same GPU you will have to use the context thread migration API (I don’t know how it works, only that it exists). The alternative is to have a specific, persistent worker thread hold the GPU context and send it CUBLAS work. This is how I have implemented it in one of my apps.

It might have been worth mentioning you app was multithreaded at the beginning of all this, it would have made pinning down your problem a lot faster…

Topic		Replies	Views
CUBLAS problem CUDA Programming and Performance	16	3505	July 1, 2010
CUBLAS 5.5 shutdown segfaults on ubuntu 12.04 LTS GPU-Accelerated Libraries	3	1801	June 14, 2013
Cannot run any CUDA kernels CUDA runtime doesn't recognize NVIDIA GPU CUDA Programming and Performance	26	12302	August 24, 2010
gcc 4.4 support anytime soon? CUDA Programming and Performance	24	108098	April 9, 2010
CUDA Toolkit 3.0 beta released now with public downloads CUDA Programming and Performance	104	430088	March 25, 2010
cuda-gdb hangs CUDA-GDB	12	8397	May 23, 2014
cuBLAS SGEMM randomly crashes when running multiple host threads sharing one cuBLAS handle. CUDA Programming and Performance	5	981	December 12, 2017
CUDA Toolkit 3.0 update GPU HW debugging tools to replace device emulation CUDA Programming and Performance	44	29434	April 29, 2010
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53665	August 1, 2011
CUDA compile trouble CUDA Programming and Performance	47	5108	November 8, 2010

cublas SEGFAULT in cublasInit() cublas SEGFAULT in cublasInit() but locally compiled examples run.

Related topics