Only Fermi cards compatible with OpenCL 1.1? New minimum requirements for devices

In 1.1 specs (rev 33), Appendix E lists changes from 1.0. Among others, there’s this little bit:

Those are the new minimums for the total size of parameters passed to a kernel and the amount of local memory the card should have. At least device queries should never report less.

1.0’s numbers were exactly what you get in all NVIDIA GPUs since G80 - 256 bytes of kernel arguments and 16KB of shared/local memory. The new numbers make non-Fermi cards technically not compliant.

This increase seems arbitrary, to say the least, and I can’t understand why it was introduced.

Additionally, atomics are now core functionality - both on global and local memory, meaning pre-1.2 compute capability cards are out.

I can’t help thinking that’s NVIDIA trying to force us into newer hardware…

I look at it a different way. First, I think making the 4 atomics extensions core would be good for organizations that need this in their apps / products. Listing a version requirement is much more preferable than a version wt extension. I just wish Image Access Method became core, not just 3d writes if image is supported.

Same argument for the arbitrary increases. You cannot as easily take advantage of stuff when their minimum size is set so low. Better to raise things, so developers just need to put 1.1 device required “on the box”.

That said, OpenCL 1.1 platforms should work with 1.0 devices, so older hardware can still play. There is a version query for both clPlatform & clDevice. If there was not, NVidia (ATi someday) would have to ship either separate drivers or both platforms in one. Aren’t the size of these driver downloads pushing the limit already?

I would not expect support for a 1.1 core feature by hardware, which did support it when it was optional. However, there might be some gains for 1.0 devices that are not device related like:

  • event callbacks
  • CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE
  • CL_KERNEL_PRIVATE_MEM_SIZE
  • float3, etc.
  • mem copy by rect
  • official support for multiple host threads

Can anyone confirm that a 1.0 device can do any of this stuff with the 1.1 beta?

This is one of the things I’m curious about - whether 1.0 devices can still be used in 1.1 with event callbacks, thread-safety and all. I’m writing a project that will use this functionality and I don’t want it to be limited to Fermi cards, since those specific features have nothing to do with how much local memory the device has.

Anybody got CL_KERNEL_PRIVATE_MEM_SIZE actually working?
CL_KERNEL_LOCAL_MEM_SIZE seems to be OK, CL_KERNEL_PRIVATE_MEM_SIZE never returns sane values for me though. (Returns 0, or sometimes 4, which both is far from true)