__nv_bool mixup with bool

sls.cu(200): error: no instance of overloaded function "std::condition_variable::wait" matches the argument list
            argument types are: (std::lock_guard<std::mutex>, lambda []()->__nv_bool)
            object type is: std::condition_variable

I have a ‘.cu’ file and part of the code there is host code, why should I not be able to do this? (it’s in a host function, HV_rdy is in host memory, everything is just host here)

bool HV_rdy = false;
...
cv.wait(lk, []{return HV_rdy;});

So you cannot even do std::sort with a custom comparator :(
Minimum reproducible example: #include <iostream>#include <assert.h>#include <algorithm>#include <vector - Pastebin.com [Thanks to @blelbach for pointing out this example is wrong due to me forgetting an iterator arg, I mean ARGHHHH! sorry!) But here is another example with the original case of cv.wait: Compiler Explorer
It works with g++/clang, but spits out the error I have up on this post. My guess is that during SFINAE it mixes up the resolution and return type ends up being __nv_bool instead of bool.

I still can’t reproduce this. Can you please provide a minimal example showing the issue on Godbolt, with NVCC?

This works just fine: Compiler Explorer

I’m not sure what the relation to this CV code and std::sort is.

1 Like

They aren’t related directly, it is a big program and throws multiple errors, as these 2 both had the same “__nv_bool” / “bool” in their error diagnostics, I thought they are the same issue.
I am confused, I see that godbolt throws no error, but godbolt “NVCC 11.3.0 sm_52” seems to only emit device code (which in this case there isn’t any).
This is the verbose output from my machine (nvcc -v):

#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda-11.3/bin
#$ _THERE_=/usr/local/cuda-11.3/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/usr/local/cuda-11.3/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda-11.3/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda-11.3/bin/../lib:/usr/local/cuda-11.3/lib64:/usr/local/cuda-11.3/lib64
#$ PATH=/usr/local/cuda-11.3/bin/../nvvm/bin:/usr/local/cuda-11.3/bin:/usr/local/cuda-11.3/bin:/home/iman/.vscode-server/bin/8dfae7a5cd50421d10cd99cb873990460525a898/bin/remote-cli:/home/iman/.cargo/bin:/home/iman/.local/bin:/usr/local/cuda-11.3/bin:/home/iman/.nvm/versions/node/v16.6.2/bin:/home/iman/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin:/opt/gradle/gradle-6.0.1/bin:/home/iman/.cargo/bin/:/opt/gradle/gradle-6.0.1/bin:/home/iman/.cargo/bin/
#$ INCLUDES="-I/usr/local/cuda-11.3/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/usr/local/cuda-11.3/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda-11.3/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -std=c++17 -D__CUDA_ARCH__=520 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda-11.3/bin/../targets/x86_64-linux/include"    -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=3 -D__CUDACC_VER_BUILD__=58 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=3 -include "cuda_runtime.h" -m64 "cv.cu" -o "/tmp/tmpxft_00213fc9_00000000-9_cv.cpp1.ii" 
#$ cicc --c++17 --gnu_version=90400 --orig_src_file_name "cv.cu" --allow_managed   -arch compute_52 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_00213fc9_00000000-3_cv.fatbin.c" -tused --gen_module_id_file --module_id_file_name "/tmp/tmpxft_00213fc9_00000000-4_cv.module_id" --gen_c_file_name "/tmp/tmpxft_00213fc9_00000000-6_cv.cudafe1.c" --stub_file_name "/tmp/tmpxft_00213fc9_00000000-6_cv.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_00213fc9_00000000-6_cv.cudafe1.gpu"  "/tmp/tmpxft_00213fc9_00000000-9_cv.cpp1.ii" -o "/tmp/tmpxft_00213fc9_00000000-6_cv.ptx"
cv.cu(22): error: no instance of overloaded function "std::condition_variable::wait" matches the argument list
            argument types are: (std::lock_guard<std::mutex>, lambda []()->__nv_bool)
            object type is: std::condition_variable

1 error detected in the compilation of "cv.cu".
# --error 0x1 --

here is the cv.cpp1.ii: https://gist.github.com/ImanHosseini/1c45a8d07288701a9d3cabf5d52dc282
Update: & it’s not due to g++ version, I tried “-ccbin /usr/bin/g+±10” to no avail.
Update II: I tried multiple versions of nvcc, only this one fails:

Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0

Every other version I tried has no issue. & I don’t know why it cant be reproduced in godbolt’s 11.3.0.
Here is where I got this version from hell: https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run
Couple of github issues with similar error: Compile error in core/context_gpu.cu · Issue #278 · facebookarchive/caffe2 · GitHub But I don’t know if they are related or not.

So my problem is actually fixed by simply using another nvcc/cuda version, but I really had an itch to see what the problem with that bad build/ version was. I took up the proverbial drill (strace), and:

strace -f -s 1000000 nvcc -std=c++17 cv.cu 2>trace.txt

Once for the working nvcc, and once for the one that did not work. The dumps are huge but here is the main difference, in the correct version, we have this:

openat(AT_FDCWD, "/tmp/tmpxft_001b51eb_00000000-6_cv.cudafe1.gpu", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
...
write(4, "typedef char __nv_bool;\nstruct __EDG_type_info;struct __class_type_info;\n ...")

Whereas in the wrong version, this typedef is nowhere to be found. Well as we know the name of the file we should be able to see into it with ‘-keep’ right? (more fun than extracting it from strace output!) Nope, even with ‘-keep’ nvcc calls unlink on those tmp files but fear not: So I made a dummy unlink implementation that doesn’t actually unlink & just prints out what file unlink was called on: dummy unlink · GitHub now I can LD_PRELOAD it:

LD_PRELOAD=./myunlink.so nvcc -std=c++17 cv.cu

Now does files aren’t going anywhere! Here is what should be generated (the one that worked): correct & troublesome cudafe1.gpu · GitHub
So for some reason, the error stems from the fact that cicc fails and the cudafe1.gpu file which typedefs __nv_bool doesn’t get made. We could’ve deducted as much just with ‘-v’ but now we can shoehorn in the (tmpx files) which I don’t recommend
So for some odd reason, the cicc that I have is busted. → nope. it wasn’t this. it’s not related to nvcc version at all, I have 2 machines, on 1 this occurs for every version. So whatever the reason is, is some weird reason unrelated to CICC itself. But what else, besides the CICC binary matters here?

so that cicc command that took in a ‘.ii’ file was leading to that __nv_bool error. & it was happening only on one of my machines (for any cuda). So something else on my system must be broken. I traced “openat” syscalls that cicc makes (removing those ENOENTs):

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/dev/urandom", O_RDONLY) = 3
openat(AT_FDCWD, "/tmp/tmpxft_00249675_00000000-9_cv-9245cd..lgenfe.bc", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600) = 3
openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/tmp/tmpxft_00249675_00000000-9_cv.cpp1.ii", O_RDONLY) = 3
cv.cu(22): error: no instance of overloaded function "std::condition_variable::wait" matches the argument list
            argument types are: (std::lock_guard<std::mutex>, lambda []()->__nv_bool)
            object type is: std::condition_variable

1 error detected in the compilation of "cv.cu".
+++ exited with 1 +++

hmm…

What if I copy the “cpp1.ii” file from the machine which works, to the one that doesn’t work and overwrite it? I did this and it worked → so if CICC is fed the correct “.ii” file, it would not throw that error, the issue is in “cpp1.ii” file itself. (the BAD and correct ii files for reference: https://gist.github.com/ImanHosseini/d1b3c85690ab505de642e23266cfdbfb)
Where does it come from? Here:

gcc -std=c++17 -D__CUDA_ARCH__=520 -D__CUDA_ARCH_LIST__=520 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda-11.6/bin/../targets/x86_64-linux/include"    -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=6 -D__CUDACC_VER_BUILD__=124 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=6 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "cv.cu" -o "/tmp/tmpxft_00279afd_00000000-9_cv.cpp1.ii"" 

So: on “gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0”, this leads to the problematic ‘.ii’ file. (btw, this is the default gcc version for cuda 11.6).
godbolt repro on cuda 11.5: Compiler Explorer
Ok this is weird. This is slightly different than the other testcase that I had, so this is a super-sly bug? Nope: you can’t pass lock_guard wrapper to cv.wait(…), it was a simple c++ bug after all! So I guess the moral is if you rely too much on g++ diagnostics to catch these, with nvcc you don’t get those nice diagnostics!
I had 2 different .cu files on the 2 machine, they were the same except one had “lock_guard”, and I didn’t realize it! I thought something is wrong with one of systems. It would really payoff to separate the host code into separate files as much as possible: you get better errors if you are doing something wrong like this.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.