cublasAlloc() issue using Cygwin GCC Toolchain

Dear CUDA-team,

first of all I have to thank you for this great piece of software!

Unfortunately I have a problem concerning the CUBLAS library from the 1.0 CUDA Toolkit on Microsoft Windows XP within a Cygwin environment (specifically cublasemu.dll and cublasAlloc()) which might be an indication of stack corruption within cublasAlloc().

As I think this to be a rather interesting issue, I want to share it with you and maybe some of you might even be able to help me out.

I’m working on a large multi-platform project (Win32, Linux, MacOS) that is using the Cygwin GCC Toolchain on Windows.

Cutting it short: I want to use some CUBLAS routines in an executable compiled with Cygwin’s GCC.

There is this standard technique to link from Cygwin’s GCC against MS Windows DLLs by generating special linker archives (e.g., cublasemu.dll.a) using the MS Visual Studio tool dumpbin in conjunction with Cygwin’s dlltool. These linker archives are required at compile-time by the Cygwin linker ld.

We are using this technique for almost two years now with a broad range of libraries, never observing any problems.

For your convenience I also post the script used to extract symbol information from Win32 DLLs and generate these obscure .dll.a-files:

#!/bin/bash

if [ "$1" == "" ]; then

    echo "First argument must be a dll file"

    exit 1;

fi

DLLNAME=$1

DEFNAME=$1.def

ARCHIVENAME=$1.a

DUMPNAME=$1.dump

SEDSCRIPT="/[     ]*ordinal hint/,/^[     ]*Summary/{\

 /^[     ][0123456789]/{\

   s/^[     ][0123456789][ \t][0123456789ABCDEFabcdef][     ][0123456789ABCDEFabcdef][     ]\(.*\)//p\

 }\

}"

echo

echo "Extracting symbol names and generating ${DEFNAME}..."

echo EXPORTS > "${DEFNAME}"

dumpbin /exports "${DLLNAME}" /OUT:"${DUMPNAME}"

sed -ne "${SEDSCRIPT}" < "${DUMPNAME}" >> "${DEFNAME}"

echo

echo "Building linker archive ${ARCHIVENAME} from symbol names..."

dlltool -v --dllname "${DLLNAME}" --input-def "${DEFNAME}" --output-lib "${ARCHIVENAME}"

The above script is used to generate GCC compliant linker archives for each of the CUDA/CUBLAS DLLs (this might also be helpful for those of you that already have asked whether it is possible to link against CUDA/CUBLAS DLLs from an MinGW environment).

I have narrowed the problem down to the following:

  1. Build a class hierarchy of two classes where Child is an ancestor of Base

  2. In the constructor of Child call cublasAlloc()

  3. Build all this with Cygwin’s GCC and link against cublasemu.dll.a

  4. During runtime, make sure that cublasemu.dll is in the path

So finally here is my code:

CublasTest.h:

#ifndef CUBLASTEST_H_

#define CUBLASTEST_H_

class Base {

public:

    Base();

    ~Base();

protected:

    float* devPtr_;

}; 

class Child : Base {

public:

    Child(int size);

    ~Child();    

}; 

#endif /*CUBLASTEST_H_*/

CublasTest.cpp:

#include "CublasTest.h"

#include <stdio.h>

#include <cublas.h>

Base::Base() {

    printf("Base c-tor\n");

    printf("cublasInit()=%d\n", cublasInit());    

}

Base::~Base() {

    printf("Base d-tor\n");

}

Child::Child(int size) : Base() {

    printf("Child c-tor\n");    

    cublasAlloc(size, sizeof(float), (void**)&this->devPtr_);

    // -- CORRUPT STACK AROUND HERE--

}

Child::~Child() {

    printf("Child d-tor\n");

}

Main.cpp:

#include <stdio.h>

#include "CublasTest.h"

int main(int argc, char* argv[]) {    

    printf("Enter Main\n");            

    Child child(3);

    return 0;

}

This compiles and links fine using Cygwin’s GCC or Microsoft Visual Studio (you must make the CUDA_INC_PATH and CUDA_LIB_PATH available and link against cublasemu respectively).

The output I would expect is something like (and this is exactly what I get if I compile with MSVC):

Enter Main

Base c-tor

cublasInit()=0

Child c-tor

Child d-tor

Base d-tor

The output that is generated by code compiled with GCC is somehow strange:

Enter Main

Base c-tor

cublasInit()=0

Child c-tor

Enter Main

Base c-tor

cublasInit()=0

Child c-tor

      6 [main] TemplateTest 10316 _cygtls::handle_exceptions: Exception: STATUS_ACCESS_VIOLATION

    650 [main] TemplateTest 10316 open_stackdumpfile: Dumping stack trace to TemplateTest.exe.stackdump

  53432 [main] TemplateTest 10316 _cygtls::handle_exceptions: Exception: STATUS_ACCESS_VIOLATION

  57846 [main] TemplateTest 10316 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack)

[list=1]

[*] If compiled/linked with Cygwin GCC the main routine is entered twice (corrupt stack)!?

[*] Well, this in return causes a segmentation fault ;)

[*] The problem definitely vanishes if the call to cublasAlloc() is removed

[*] If one moves the cublasAlloc() into Base::Base() (the Base class constructor), or the main() routine everything works fine, but this is not what I want

I’m not sure if this is really a CUBLAS issue or rather a general problem concerning Cygwin’s dlltool, producing a corrupt cublasemu.dll.a. It might be as well a linker error in which case this would be the wrong place to ask.

On the other hand this possibly might indicate as well a stack corruption problem within cublasemu.dll and cublasAlloc() that does not occur if compiled with MSVC because the MS compiler generates slightly different code.

Currently I didn’t validate the behavior on G8x hardware in native mode, but the emulation code works just fine if compiled on SuSe Linux 10.1, so I guess this is only a Windows issue.

Anyways, I just wanted to ask and am greatful for any feedback.

Regards

Eric

Tools used:

Eclipse 3.2.0 / CDT 3.1.2

Cygwin gcc (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)

Microsoft Visual Studio .NET 2005

CUDA Tookit 1.0