OpenCL on Linux woes

Hi *,

developing OpenCL on Linux (on NVIDIA) certainly feels like Cinderellas’ bad part of life.
No IDE, no Debugger, no nothing.(*) Dancing blindfolded in a mine field.

NVIDIA certainly makes it abundantly clear, OpenCL is an unwanted child.
Well - I’m stuck with a Nvidia GPU in my notebook and I certainly cannot tear it out and replace by something else. I also most certainly will NOT program in CUDA, as I need a cross-vendor GPU accelerated computing executable.

So what are my options? The forum. ;-)

Let’s start with my normal mode of R&D operation. I use emacs for typing in the program, then I use make and gcc to translate the program. Because I make no programming errors :-D, all works fine and happily ever after - I do not even need a gdb.

Enter Nvidia-OpenCL:

*** Error in `oclprog': double free or corruption (out): 0x000000000333a440 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x722ab)[0x7f164bc492ab]
/usr/lib/libc.so.6(+0x7890e)[0x7f164bc4f90e]
/usr/lib/libc.so.6(+0x7911e)[0x7f164bc5011e]
/usr/lib/libnvidia-opencl.so.1(+0xd11b0)[0x7f162b0b81b0]
/usr/lib/libnvidia-opencl.so.1(+0xb59f5)[0x7f162b09c9f5]
/usr/lib/libnvidia-opencl.so.1(+0xb5dd7)[0x7f162b09cdd7]
/usr/lib/libnvidia-opencl.so.1(+0xd78ba)[0x7f162b0be8ba]
/usr/lib/libnvidia-opencl.so.1(+0xce250)[0x7f162b0b5250]
/usr/lib/libnvidia-opencl.so.1(+0xce6d0)[0x7f162b0b56d0]
/usr/lib/libOpenCL.so.1(clEnqueueNDRangeKernel+0x62)[0x7f164bf836d2]
oclprog[0x401b8e]
/usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f164bbf7511]
oclprog[0x4021aa]
======= Memory map: ========
00400000-0043b000 r-xp 00000000 103:05 5386509

Sometimes it’s the libnvidia-opencl, sometimes it’s the compiler, sometimes it’s the “double free or corruption” sometimes it’s a “corrupted double linked list”. As I said: minefield.

Yes, the hardware (Quadro M2000M in a Lenovo P50) is ok. Yes, the program is ok. Actually everything was ok until the kernel grew a little bit more complex a week ago. Then both compiler and executables started to exhibit very weird behavior. I guess I’m hitting some hidden limit but becuase there is no sane error message I cannot be sure.

My software may be a little bit recent:

CUDA 8.0.61
gcc 6.3.1
glibc 2.25
nvidia 378.13
@Arch Linux

Now I wonder: What can I do to exit this nightmare called OpenCL@NVIDIA development I’ve been experiencing for the past week? And so you know what I mean with nightmare:

if (memcmp((uchar*)sha256_in32,(uchar*)sha256_in64)) {
    //printf("Not same!\n");
  }

  sha256_u(sha256_in64, sha256_out);
  ripemd160_transform(sha256_out, ripemd160_out);

  if (bloom_chk_hash160(bloom, ripemd160_out)) {
     printfind(ripemd160_out, 'u', privkey, idx);
  }

works. If I remove the printf comment -> segmentation fault. (other printfs work)

If I do

if (memcmp((uchar*)sha256_in32,(uchar*)sha256_in64)) {
    //printf("Not same!\n");

    sha256_u(sha256_in64, sha256_out);
    ripemd160_transform(sha256_out, ripemd160_out);

    if (bloom_chk_hash160(bloom, ripemd160_out)) {
       printfind(ripemd160_out, 'u', privkey, idx);
    }
  }

segmentation fault. If I do

if (memcmp((uchar*)sha256_in32,(uchar*)sha256_in64)) {
    //printf("Not same!\n");
  }
  else {
    sha256_u(sha256_in64, sha256_out);
    ripemd160_transform(sha256_out, ripemd160_out);

    if (bloom_chk_hash160(bloom, ripemd160_out)) {
       printfind(ripemd160_out, 'u', privkey, idx);
    }
  }

segmentation fault. It’s a friggin’ nightmare!

removing the comment from the printf “Not same!”

*** Error in `oclprog': corrupted double-linked list: 0x00000000031ef690 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x722ab)[0x7f5ae48c22ab]
/usr/lib/libc.so.6(+0x7890e)[0x7f5ae48c890e]
/usr/lib/libc.so.6(+0x7aec0)[0x7f5ae48caec0]
/usr/lib/libc.so.6(__libc_malloc+0x54)[0x7f5ae48cc674]
/usr/lib/libc.so.6(_IO_file_doallocate+0x8c)[0x7f5ae48b7b6c]
/usr/lib/libc.so.6(_IO_doallocbuf+0x46)[0x7f5ae48c6286]
/usr/lib/libc.so.6(_IO_file_overflow+0x1d8)[0x7f5ae48c5578]
/usr/lib/libc.so.6(_IO_file_xsputn+0xb6)[0x7f5ae48c4636]
/usr/lib/libc.so.6(_IO_vfprintf+0x10f)[0x7f5ae489891f]
/usr/lib/libc.so.6(_IO_printf+0xa6)[0x7f5ae48a0ea6]
oclprog[0x401cf4]
/usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f5ae4870511]
oclprog[0x4021aa]

Where can I get OpenCL 2.0 ?

The libnvidia-compiler also wants its own share:

*** Error in `oclprog': corrupted double-linked list: 0x0000000002b79990 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x722ab)[0x7f921d0622ab]
/usr/lib/libc.so.6(+0x7890e)[0x7f921d06890e]
/usr/lib/libc.so.6(+0x78c9c)[0x7f921d068c9c]
/usr/lib/libc.so.6(+0x797a0)[0x7f921d0697a0]
/usr/lib/libnvidia-compiler.so.378.13(+0xabe2c7)[0x7f91e5a6d2c7]
/usr/lib/libnvidia-compiler.so.378.13(+0x202095)[0x7f91e51b1095]
/usr/lib/libnvidia-compiler.so.378.13(+0x2021a9)[0x7f91e51b11a9]
/usr/lib/libnvidia-compiler.so.378.13(+0x23c9bd)[0x7f91e51eb9bd]
/usr/lib/libnvidia-compiler.so.378.13(+0x269c60)[0x7f91e5218c60]
/usr/lib/libnvidia-compiler.so.378.13(+0x269f68)[0x7f91e5218f68]
/usr/lib/libnvidia-compiler.so.378.13(+0x25ffd1)[0x7f91e520efd1]
/usr/lib/libnvidia-compiler.so.378.13(+0x25ed91)[0x7f91e520dd91]
/usr/lib/libnvidia-compiler.so.378.13(+0x25f48e)[0x7f91e520e48e]
/usr/lib/libnvidia-compiler.so.378.13(+0x2c4eed)[0x7f91e5273eed]
/usr/lib/libnvidia-compiler.so.378.13(+0x2c4f15)[0x7f91e5273f15]
/usr/lib/libc.so.6(+0x366c0)[0x7f921d0266c0]
/usr/lib/libc.so.6(+0x3671a)[0x7f921d02671a]
/usr/lib/libc.so.6(__libc_start_main+0xf8)[0x7f921d010518]

I think memcmp() requires three arguments, but your code specifies only two.
It might cause undefined behaviors.

Did you check your GCC’s warnings or errors?

Doesn’t memcmp() require three arguments which renders all your rant slightly invalid?

You might want to build your future applications with: -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes

and also -Wpedantic with -stdXXX

and also -Werror -pedantic-errors

The problem - guys - is the printf.

If I throw out the memcmp completely, and leave in the OpenCL code more than one printf -> boom

This is nowhere the rant it should be.

@birdie:
Thanks for the GCC option suggestions, but you are certainly aware I was posting OpenCL code (which isn’t compiled by gcc)?

On Tesla K80, even a single printf in the OpenCL code crashes the app.

Upon 1st invocation:

corrupted double-linked list: 0x00000000026d8680 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f222f3097e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x81f88)[0x7f222f313f88]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f222f3155d4]
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1(+0x269483)[0x7f220e8ab483]
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1(+0x2fff33)[0x7f220e941f33]
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1(+0xcf385)[0x7f220e711385]
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1(+0xc5e0d)[0x7f220e707e0d]
./oclprog[0x402583]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f222f2b2830]
./oclprog[0x402b59]
...

(On Maxwell and Pascal chips this works - actually.)

Any subsequent invocation:

oclprog: ../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted (core dumped)

Now the magic: Remove the single “printf” in the OpenCL code:

tadaaa. The OpenCL prog runs - albeit with no output.

Sidenote: these gcc compiler options were there from the dawn of time

-Wall -Wextra -Wno-pointer-sign -Wno-sign-compare -pedantic -std=gnu99

but that’s irrelevant. The crash happens when the OpenCL app tries to printf something.

Funny thing is, the app is not even threaded (does not use nptl) , so that pthread_mutex_lock message comes from some of the linked libs

linux-vdso.so.1 =>  (0x00007ffe355df000)
        libOpenCL.so.1 => /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 (0x00007f0201073000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0200caa000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0200aa5000)
        /lib64/ld-linux-x86-64.so.2 (0x000055e61a026000)

The latest supported compiler ist gcc 5.3. See also here: https://devtalk.nvidia.com/default/topic/949770/cuda-8-0rc-supporting-gcc6-/