CUDA Toolkit and Driver 2.3a for OS X released

This should address compiling on Snow Leopard, installer permissions, and various other minor issues.

Known issues:

    Install the toolkit before the driver. Installing the driver before the toolkit usually works, but occasionally libcuda.dylib will get deleted.

    The compiler fix on Snow Leopard (for now) is along the lines of “disable blocks within CUDA files.” This means you can’t use Grand Central Dispatch within .cu files.

    64-bit kernels are unsupported.

The SDK adds OpenCL samples in this release as well. Obviously, these will only work on Snow Leopard.

Downloads:

CUDA Toolkit 2.3a

http://developer.download.nvidia.com/compu…3a_macos_32.pkg

CUDA Driver 2.3.1a

http://developer.download.nvidia.com/compu…3.1a_macos.pkg

For use with Quadro FX 4800 or GeForce GTX 285 on Leopard, and any NVIDIA GPU on Snow Leopard. If you are running Snow Leopard, you want this package.

CUDA Driver 2.3.0a

http://developer.download.nvidia.com/compu…3.0a_macos.pkg

For all other NVIDIA GPUs on Leopard. If you are running Leopard without a Mac Pro and a GT200-based card, you want this package.

CUDA SDK 2.3a and release notes

http://developer.download.nvidia.com/compu…3a_macos_32.pkg

Release Notes

http://developer.download.nvidia.com/compu…_c-for-cuda.txt

(only a week late…)

Thanks!!

It doesn’t work for me though… I still can’t compile the SDK. I got the toolkit 2.3a and the driver 2.3.1a. I have a Macbook Pro 5,3 with Snow Leopard 10.6.1 installed. When I’m trying to compile I have the following errors:

make -C src/3DFD/ 

make -C src/alignedTypes/ 

make -C src/asyncAPI/ 

ld:ld: warning: in obj/release/asyncAPI.cu.o, file is not of required architecture

ld: warning: in /usr/local/cuda/lib/libcudart.dylib, file is not of required architecture

ld: warning: in ../../lib/libcutil.a, file is not of required architecture

Undefined symbols:

  "_main", referenced from:

	  start in crt1.10.6.o

ld: symbol(s) not found

collect2: ld returned 1 exit status

 warning: in obj/release/3dfd.cu.o, file is not of required architecture

ld: warning: in /usr/local/cuda/lib/libcudart.dylib, file is not of required architecture

ld: warning: in ../../lib/libcutil.a, file is not of required architecture

Undefined symbols:

  "_main", referenced from:

	  start in crt1.10.6.o

ld: symbol(s) not found

collect2: ld returned 1 exit status

ld: warning: in obj/releasemake[1]: *** [../../bin/darwin/release/asyncAPI] Error 1

/alignedTypes.cu.o, file is not of required architecturemake: *** [src/asyncAPI/Makefile.ph_build] Error 2

make: *** Waiting for unfinished jobs....

ld: warning: in /usr/local/cuda/lib/libcudart.dylib, file is not of required architecture

ld: warning: in ../../lib/libcutil.a, file is not of required architecture

Undefined symbols:

  "_main", referenced from:

	  start in crt1.10.6.o

ld: symbol(s) not found

collect2: ld returned 1 exit status

make[1]: *** [../../bin/darwin/release/3dfd] Error 1

make[1]: *** [../../bin/darwin/release/alignedTypes] Error 1

make: *** [src/3DFD/Makefile.ph_build] Error 2

make: *** [src/alignedTypes/Makefile.ph_build] Error 2

So it seems that the compiler tries to build in 64 bits. I tried to add a -m32 flag into /common/common.mk…

# Compiler-specific flags  

NVCCFLAGS := -m32

CXXFLAGS  := -m32 $(CXXWARN_FLAGS)

CFLAGS	:= -m32 $(CWARN_FLAGS)

But this did not change a thing. What am I missing?

are you using the new SDK?

The CUDA SDK didn’t compile for me unless I specified it to be 32 bit(make i386=1), I installed it in the order tmurray gave and didn’t have any previous CUDA/openCL installations.

Oups, sorry. Since you only mention the install of the toolkit and the driver I assumed the SDK was the same. But indeed it was listed as 2.3a. Let me try this out…

CUDA:
For me the new SDK compiled all CUDA samples fine - without changing anything to the makefiles. Good!
I did not revert gcc and g++ back to the defaults though.

Before installing, I completely removed CUDA 2.3:
/System/Library/Extensions/CUDA.kext
/usr/local/CUDA/*
/Development GPU Computing

Then restarted, then installed as tmurray recommended: first toolkit, then driver 2.3.1 and last SDK.

OpenCL:
Some SDK samples (I believe the ones doing OpenGL interop) did not build reporting a wrong architecture in libglew or so.
But, even the ones that built are not running.
Error -32 in clGetPlatformInfo Call !!!

Regards
Mark

So this time I have the latest SDK installed (completely removed GPU Computing first of course). The “make i386=1” suggested by Adam helped a bit since I went further this time. But still didn’t make it:

make -C src/3DFD/ 

make -C src/alignedTypes/ 

make -C src/asyncAPI/ 

make -C src/bandwidthTest/ 

make -C src/bicubicTexture/ 

ld: warning: in ../../lib/librendercheckgl_i386.a, file is not of required architecture

Undefined symbols:

  "CheckBackBuffer::CheckBackBuffer(unsigned int, unsigned int, unsigned int, bool)", referenced from:

	  runAutoTest(int, char**)in bicubicTexture.cpp.o

	  runCUDASample(int, char**)in bicubicTexture.cpp.o

ld: symbol(s) not found

collect2: ld returned 1 exit status

make[1]: *** [../../bin/darwin/release/bicubicTexture] Error 1

make: *** [src/bicubicTexture/Makefile.ph_build] Error 2

The only thing I didn’t remove is the /System/Library/Extensions/CUDA.kext before installing the new ‘a’ versions. You think I need to re-install everything from scratch?

The OpenCL examples do compile. But some examples failed at the execution. For instance:

bin/darwin/release/oclHistogram 

bin/darwin/release/oclHistogram Starting...

Initializing data...

Initializing OpenCL...

Allocating OpenCL memory...

Initializing 64-bin OpenCL histogram...

...loading Histogram64.cl from file

!!! Error # 0 at line 43 , in file src/oclHistogram64_launcher.cpp !!!

Exiting...

I tried Nboby, SobelFilter and they failed in a similar way. BandwithTest passed. Didn’t try the other ones.

are you using a 64-bit kernel? 64-bit kernels are unsupported (that’s the known issue I was forgetting)

Mine was a 32 bit kernel and gave the same error messages as Morph208, when I compiled the CUDA SDK explicitly as 32 bit it worked and the OpenCL examples all work for me, although painfully slow.

No, no, I’m using a 32-bit kernel. With the option i386=1 it should compile in 32-bit mode and not complain about architecture…

On another side I tried to execute one example I could compile. But it failed:

cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 647 : no CUDA-capable device is available.

I’m using the 9400M at the moment and I could switch to the 9600M GT, but the 9400M is CUDA-capable, isn’t it?

I restarted from scratch and this time it worked. It seems that deleting /System/Library/Extensions/CUDA.kext is important after all. All the CUDA SDK compiled and all the OpenCL examples as well.

But CUDA still can’t find any CUDA-capable device on my computer (even the 9600M GT). It seems that OpenCL can, but execution failed on many examples. Each time, the error is on the shrCheckError following the loading of the .cl file. It behaves like it can’t find the .cl file.

char *cHistogram64 = oclLoadProgSource(shrFindFilePath("Histogram64.cl", argv[0]), "// My comment\n", &kernelLength);

		shrCheckError(cHistogram64 != NULL, shrTRUE);

Hope this helps.

If someone can help me with the fact that CUDA can’t see any CUDA-capable devices on my computer, that would be great! Thanks.

@Morph208
Did you restart your Mac after deleting the old KEXT and before installing the new one?

Deleting the KEXT manually will not unload it - so the wrong one remain “active” even after you install the new driver. That might be the cause for CUDA finding no device.

A restart is actually not necessary if you call kextunload, then delete the old kext, then install the new driver (which seems to load the new kext automatically) - but I often just delete the old one and restart.

I would describe this release as brilliant, if a tiny bit flaky still. The CUDA nbody would not run at first, and I had to reload the CUDA kext after a hard restart. Then it all seemed to settle down and I have been able to run several examples in both CUDA and OpenCL zones. The migration of the standard examples to ocl is a massive help - thanks Nvidia!

FYI, I include the output of oclDeviceQuery on my sys. This recognizes both 285s on an 08 Mac Pro (one is 1G Mac version, other is 2G injected PC) as well as the CPU. First note it is fine with the PC card. I think what gives me a buzz is the fact that the CPU and GPU are there on a more or less equal basis…I have never seen anything like this before!

I have two questions for Nvidia,
Q1. how do I control the target device as under CUDA, where I would say e.g. ./nbody --device=0, when I want to target the same under OpenCL. So e.g. I want to run oclNbody on each of my 3 devices controlling which goes where?
Q2. What is the story with double precision on the GPU?

oclDeviceQuery.exe Starting…

OpenCL SW Info:

CL_PLATFORM_NAME: Apple
CL_PLATFORM_VERSION: OpenCL 1.0 (Jul 15 2009 23:07:32)
OpenCL SDK Version: 1.2.0.16

OpenCL Device Info:

of devices supporting OpenCL = 3:

CL_DEVICE_VENDOR: NVIDIA
CL_DEVICE_NAME: GeForce GTX 285
CL_DRIVER_VERSION: CLH 1.0
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 240
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1476 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_IMAGE_MAX_WIDTH: 2d width 8192, 2d height 8192, 3d width 2048, 3d height 2048, 3d depth 2048
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 1024 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_EXTENSIONS:
cl_khr_byte_addressable_store
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
CL_DEVICE_PREFERRED_VECTOR_WIDTH: char 1, short 1, int 1, long 1, float 1, double 0

CL_DEVICE_VENDOR: NVIDIA
CL_DEVICE_NAME: GeForce GTX 285
CL_DRIVER_VERSION: CLH 1.0
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 240
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1476 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_IMAGE_MAX_WIDTH: 2d width 8192, 2d height 8192, 3d width 2048, 3d height 2048, 3d depth 2048
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 512 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 2048 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_EXTENSIONS:
cl_khr_byte_addressable_store
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
CL_DEVICE_PREFERRED_VECTOR_WIDTH: char 1, short 1, int 1, long 1, float 1, double 0

CL_DEVICE_VENDOR: Intel
CL_DEVICE_NAME: Intel® Xeon® CPU E5462 @ 2.80GHz
CL_DRIVER_VERSION: 1.0
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
CL_DEVICE_MAX_COMPUTE_UNITS: 8
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1 / 1 / 1
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1
CL_DEVICE_MAX_CLOCK_FREQUENCY: 2800 MHz
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_IMAGE_MAX_WIDTH: 2d width 8192, 2d height 8192, 3d width 2048, 3d height 2048, 3d depth 2048
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 3072 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 12288 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: global
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_EXTENSIONS:
cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_APPLE_gl_sharing
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
CL_DEVICE_PREFERRED_VECTOR_WIDTH: char 16, short 8, int 4, long 2, float 4, double 2

System Info:

TEST PASSED…

Press to Quit…

@maolimu

Before installing everything again, I removed /System/Library/Extensions/CUDA.kext, /usr/local/CUDA/ and /Development GPU Computing. I restarted. Then I installed the toolkit, the driver, the SDK (in this order). And I restarted again. Is this the correct way? Or did I miss a restart?

double post sorry…

To me this sounds like the old permissions problem preventing the CUDA.kext from being loaded at startup.

Try this on the Terminal:

kextstat | grep “CUDA”

If a line is printed the driver is loaded. Then I have no more ideas what could be wrong.

If no line is printed, the driver is not loaded - probably due to wrong permissions. You can try to fix these using the instructions for release 2.3 posted here in the forum.

Good luck :)

Please post back what you find out.

As you supposed, kextstat | grep “CUDA” returned nothing. I fixed the permissions using one of your previous post, I restarted and the driver was loaded this time. I can now launch the examples of the CUDA SDK. But I still have issue with OpenCL examples (same errors that I mentionned earlier). Anyway, thanks a lot maolimu!

I tried :

make i386=1

but that didn’t fix the problem.

How do you build in 32 bit mode?

/Chris

Sorry, just saw the “it won’t work on a 64 bit kernel”. Bummer. :-(

/Chris

Yep, no 64-bit CUDA for now. Probably for CUDA 2.4… (or 2.5). I came back to a 32-bit kernel exactly for this reason…