Quadro 4000Mac and OpenCL

Hi Everyone,

I apologize if the answer is already somewhere else on the forum, but I cannot seem to find anything.

I just installed a Quadro 4000Mac on a MacPro 4.1 (OSX server, SL 10.6.8). After installing the last drivers (256.02.25f01 & CUDA 4.0.50) and the CUDA toolkit 4.0.17. My purpose was to use the Quadro as a GPGPU device using the double precision calculation. I was particularly interested by developing with OpenCL.

As it happens, it does not seem possible to use double precision or even local_atomic on the quadro. If I run oclDeviceQuery, I receive the ouput bellow. cl_khr_fp64, cl_khr_local_int32_base_atomics, cl_khr_local_int32_extended_atomics are available, but not for the GC.

On the other hand, nbody in the CUDA example is running with double without problem. I would like to know if somebody could tell me what I did wrong, or if it is a driver/opencl implementation problem.

Thank you

This looks as if you’re just looking at the OpenCL devices offered by Apple’s OpenCL platform. My guess is (I don’t have a Mac) that you should also see an NVIDIA platform which has it’s own OpenGL devices for the NVIDIA GPUs. These should offer cl_khr_fp64 etc. So, you should probably check the code that triggers the warning to be printed.

Thanks Eyebex, you may be onto something here… I will try to see if I can find something in that direction.

I have a similar problem. In my case the CUDA examples run well on my Quadro 4000 for Mac, but oclDeviceQuery doesn’t even seem to see the card.
OpenCL SW Info:

WARNING: NVIDIA OpenCL platform not found - defaulting to first platform!

CL_PLATFORM_NAME: Apple
CL_PLATFORM_VERSION: OpenCL 1.1 (Jul 25 2011 15:56:07)
OpenCL SDK Revision: 7027912

OpenCL Device Info:

2 devices found supporting OpenCL:


Device Intel® Xeon® CPU X5650 @ 2.67GHz

CL_DEVICE_NAME: Intel® Xeon® CPU X5650 @ 2.67GHz
CL_DEVICE_VENDOR: Intel
CL_DRIVER_VERSION: 1.1
CL_DEVICE_VERSION: OpenCL 1.1
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
CL_DEVICE_MAX_COMPUTE_UNITS: 24
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1 / 1
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 2660 MHz
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 12288 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 49152 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: global
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

CL_DEVICE_IMAGE 2D_MAX_WIDTH 8192
2D_MAX_HEIGHT 8192
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048

CL_DEVICE_EXTENSIONS: cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_APPLE_clut
cl_APPLE_query_kernel_names
cl_APPLE_gl_sharing
cl_khr_gl_event
cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_3d_image_writes
cl_APPLE_fp64_basic_ops
cl_APPLE_fixed_alpha_channel_orders

CL_DEVICE_PREFERRED_VECTOR_WIDTH_ CHAR 16, SHORT 8, INT 4, LONG 2, FLOAT 4, DOUBLE 2


Device ATI Radeon HD 5870

CL_DEVICE_NAME: ATI Radeon HD 5870
CL_DEVICE_VENDOR: AMD
CL_DRIVER_VERSION: 1.0
CL_DEVICE_VERSION: OpenCL 1.1
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 20
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 1024
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 850 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 128 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 512 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: INF-quietNaNs round-to-nearest round-to-zero round-to-inf

CL_DEVICE_IMAGE 2D_MAX_WIDTH 8192
2D_MAX_HEIGHT 8192
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048

CL_DEVICE_EXTENSIONS: cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_APPLE_clut
cl_APPLE_query_kernel_names
cl_APPLE_gl_sharing
cl_khr_gl_event
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_khr_3d_image_writes

CL_DEVICE_PREFERRED_VECTOR_WIDTH_ CHAR 16, SHORT 8, INT 4, LONG 2, FLOAT 4, DOUBLE 0


2D Image Formats Supported (27)

Channel Order Channel Type

1 CL_RGBA CL_FLOAT
2 Unknown CL_FLOAT
3 Unknown CL_FLOAT
4 Unknown CL_FLOAT
5 CL_INTENSITY CL_HALF_FLOAT
6 CL_RGBA CL_HALF_FLOAT
7 Unknown CL_HALF_FLOAT
8 CL_RGBA Unknown
9 CL_RGBA CL_SIGNED_INT16
10 CL_RGBA CL_SIGNED_INT32
11 CL_RGBA CL_SIGNED_INT8
12 CL_RGBA CL_SNORM_INT16
13 CL_RGBA CL_SNORM_INT8
14 CL_RGBA CL_UNSIGNED_INT16
15 CL_RGBA CL_UNSIGNED_INT32
16 CL_RGBA CL_UNSIGNED_INT8
17 CL_RGBA CL_UNORM_INT16
18 Unknown CL_UNORM_INT16
19 Unknown CL_UNORM_INT8
20 CL_A CL_UNORM_INT8
21 CL_ARGB CL_UNORM_INT8
22 Unknown CL_UNORM_INT8
23 CL_BGRA CL_UNORM_INT8
24 CL_INTENSITY CL_UNORM_INT8
25 CL_RGBA CL_UNORM_INT8
26 Unknown CL_UNORM_INT8
27 Unknown CL_UNORM_INT8


3D Image Formats Supported (27)

Channel Order Channel Type

1 CL_RGBA CL_FLOAT
2 Unknown CL_FLOAT
3 Unknown CL_FLOAT
4 Unknown CL_FLOAT
5 CL_INTENSITY CL_HALF_FLOAT
6 CL_RGBA CL_HALF_FLOAT
7 Unknown CL_HALF_FLOAT
8 CL_RGBA Unknown
9 CL_RGBA CL_SIGNED_INT16
10 CL_RGBA CL_SIGNED_INT32
11 CL_RGBA CL_SIGNED_INT8
12 CL_RGBA CL_SNORM_INT16
13 CL_RGBA CL_SNORM_INT8
14 CL_RGBA CL_UNSIGNED_INT16
15 CL_RGBA CL_UNSIGNED_INT32
16 CL_RGBA CL_UNSIGNED_INT8
17 CL_RGBA CL_UNORM_INT16
18 Unknown CL_UNORM_INT16
19 Unknown CL_UNORM_INT8
20 CL_A CL_UNORM_INT8
21 CL_ARGB CL_UNORM_INT8
22 Unknown CL_UNORM_INT8
23 CL_BGRA CL_UNORM_INT8
24 CL_INTENSITY CL_UNORM_INT8
25 CL_RGBA CL_UNORM_INT8
26 Unknown CL_UNORM_INT8
27 Unknown CL_UNORM_INT8

oclDeviceQuery, Platform Name = Apple, Platform Version = OpenCL 1.1 (Jul 25 2011 15:56:07), SDK Revision = 7027912, NumDevs = 2, Device = Intel® Xeon® CPU X5650 @ 2.67GHz, Device = ATI Radeon HD 5870

System Info:

[oclDeviceQuery] test results…
PASSED

The CUDA device query finds the Quadro just fine.

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: “Quadro 4000”
CUDA Driver Version / Runtime Version 4.0 / 4.0
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 2048 MBytes (2147024896 bytes)
( 8) Multiprocessors x (32) CUDA Cores/MP: 256 CUDA Cores
GPU Clock Speed: 0.95 GHz
Memory Clock rate: 1404.00 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 6 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 1, Device = Quadro 4000
[deviceQuery] test results…
PASSED

I have the latest drivers and environment from NVidia, but I suspect the Quadro’s Mac OS Lion driver may not be entirely OpenCL compliant…

-Mike V.

Mac Pro (Mid 2010)
Processor 2 x 2.66 GHz 6-Core Intel Xeon
Graphics ATI Radeon HD 5870 1024 MB
GPGPU Quadro 4000 for Mac 2048 MB
OS 10.7

Yeah. I am totally disappointed that I upgraded to 10.7 Lion this weekend and lost OpenCL support for my NVIDIA Quadro 4000 GPU. I’m thinking about reverting to 10.6.8 since I’m pretty invested in OpenCL at this point. I haven’t seen any sign that Apple is working on addressing this.