CUDA 9.0 on MacOSX High Sierra - samples compile, cmake projects don't

Hey Guys

Im currently trying to get CUDA to work on MacOSX High Sierra. I had some trouble initially because the error:

0 CUDA driver version is insufficient for CUDA runtime version

always occurred when trying to run a nvcc compiled binary.

I then updated mto OSX 10.13.1

I then upgraded to the latest Cuda driver (9.0.222) to the web driver (378.10.10.10.20.107) and now i can not only compile (worked before also) but also run samples (those from /Developer/NVIDIA/CUDA-9.0/samples)

This is the output from ./deviceQuery:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 750M"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147024896 bytes)
  ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            926 MHz (0.93 GHz)
  Memory Clock rate:                             2508 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

But whenever I try to build a CMake project it fails with the same error as before.

I tried for example with this example:

https://devblogs.nvidia.com/parallelforall/building-cuda-applications-cmake/
https://github.com/robertmaynard/code-samples/tree/master/posts/cmake

Has anyone else had the same problem?

Thanks for your help!

EDIT: I also tried explicitly passing in my architecture with:

cmake -DCMAKE_CUDA_FLAGS=”-arch=sm_30” ..

And to have everything in one place, the CMakeLists.txt:

cmake_minimum_required(VERSION 3.8 FATAL_ERROR)
    project(cmake_and_cuda LANGUAGES CXX CUDA)
     
    include(CTest)
     
    add_library(particles STATIC
      randomize.cpp
      randomize.h
      particle.cu
      particle.h
      v3.cu
      v3.h
      )
     
    # Request that particles be built with -std=c++11
    # As this is a public compile feature anything that links to 
    # particles will also build with -std=c++11
    target_compile_features(particles PUBLIC cxx_std_11)
     
    # We need to explicitly state that we need all CUDA files in the 
    # particle library to be built with -dc as the member functions 
    # could be called by other libraries and executables
    set_target_properties( particles
                           PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
     
    add_executable(particle_test test.cu)
     
    set_property(TARGET particle_test 
                 PROPERTY CUDA_SEPARABLE_COMPILATION ON)
    target_link_libraries(particle_test PRIVATE particles)
     
    if(APPLE)
      # We need to add the path to the driver (libcuda.dylib) as an rpath, 
      # so that the static cuda runtime can find it at runtime.
      set_property(TARGET particle_test 
                   PROPERTY
                   BUILD_RPATH ${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES})
    endif()

EDIT 2:
I also tried downgrading to cuda 8.0 and XCode 7.3.1. But same result, compiling works but running not with same error.

I can confirm to have the same problem. Also macOS 13.1.
Also opencv compiles but the samples shaw the same error messages.
I tried downgrading to 9.0.197 but no change.

I have the same problem
I tried to uninstall all completely, reboot and install again, but nothing helped
I need CUDA, but can’t use it because tests returned

-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
CUDA Driver Version: 9.0.222
GPU Driver Version: 10.26.6 355.11.10.10.15.102
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_13:16:23_CDT_2017
Cuda compilation tools, release 9.0, V9.0.175
cat /usr/local/cuda/version.txt 
CUDA Version 9.0.176

Solved by:

Uninstall all (again) & restart
Install CUDA driver 9.0.222
Install CUDA Toolkit 9.0.176 & restart
Install WebDriver-378.10.10.10.20.107.pkg 
Restart

Sorry to resurrect this zombie thread, but I am running into the same issue.
macOS 10.13.6
MacBook Pro (Retina, 15-inch, Mid 2014)/NVIDIA GeForce GT 750M
CUDA Driver Version: 418.163
Web Driver 387.10.10.10.40.128
Cuda compilation tools, release 10.1, V10.1.168

The puzzling thing to me is that it seems to be a CMake problem. I say this because if I invoke nvcc directly (nvcc my_source.cu -o a.out)it produces an executable that runs fine. It is only when I build via CMake that I have problems.

Before I go through re-install heck I’d like to understand why CMake is failing to build a good binary but nvcc is fine.
I know, I can just use nvcc alone and solve my problem but that does not scale.

Thanks in advance

Verbose CMake output below.

[ 33%] Building CUDA object CMakeFiles/0book.dir/0book.cu.o
/Developer/NVIDIA/CUDA-10.1/bin/nvcc     -x cu -c /Users/fgiraffe/code/1cuda_by_example_book/0book.cu -o CMakeFiles/0book.dir/0book.cu.o
[ 66%] Linking CUDA device code CMakeFiles/0book.dir/cmake_device_link.o
/usr/local/Cellar/cmake/3.14.5/bin/cmake -E cmake_link_script CMakeFiles/0book.dir/dlink.txt --verbose=1
/Developer/NVIDIA/CUDA-10.1/bin/nvcc   -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/0book.dir/0book.cu.o -o CMakeFiles/0book.dir/cmake_device_link.o 
[100%] Linking CUDA executable 0book
/usr/local/Cellar/cmake/3.14.5/bin/cmake -E cmake_link_script CMakeFiles/0book.dir/link.txt --verbose=1
/usr/bin/clang++  CMakeFiles/0book.dir/0book.cu.o CMakeFiles/0book.dir/cmake_device_link.o -o 0book  -L"/Developer/NVIDIA/CUDA-10.1/lib" "/Developer/NVIDIA/CUDA-10.1/lib/libcudart_static.a" -lcudadevrt
[100%] Built target 0book
/usr/local/Cellar/cmake/3.14.5/bin/cmake -E cmake_progress_start /Users/fgiraffe/code/1cuda_by_example_book/build/CMakeFiles 0

@fgiraffe

Hi, the same problem. nvcc directly goes well. But when using cmake, the same error comes up.

This might explain everything: https://gitlab.kitware.com/cmake/cmake/issues/17296