attribute((vec_type_hint(float4))) in NVidia Vectorizing float4 in NVidia?

Yoav · January 26, 2012, 12:20pm

Hello,

I found this attribute in the OpenCL spec:

attribute((vec_type_hint(float4)))

Implicit in autovectorization is the assumption that any libraries called from the __kernel must be recompilable at run time to handle cases where the compiler decides to merge or separate workitems. This probably means that such libraries can never be hard coded binaries or that hard coded binaries must be accompanied either by source or some retargetable intermediate representation. This may be a code security question for some.

For example, where the developer specified a width of float4, the compiler should assume that the computation usually uses up 4 lanes of a float vector, and would decide to merge work-items or possibly even separate one work-item into many threads to better match the hardware capabilities. A conforming implementation is not required to autovectorize code, but shall support the hint. A compiler may autovectorize, even if no hint is provided. If an implementation merges N work-items into one thread, it is responsible for correctly handling cases where the number of global or local work-items in any dimension modulo N is not zero.

In general, does the implementation of NVidia the vectorization if this hint is given (with float4)?

I know it’s scalar based, but it could use four physical work-items per one logical thread.

More specifically, how can I know if my specific kernel has been compiled to run on 4 physical work-items or not?

I need this because my work-group size (8 or 16) is smaller than the warp size (32 on NVidia cards) and it involves using read_image_f that returns float4 (most other operations are also float4 based) so if I want to “scalarize” it manually it would require either writing the value that I read from the image to a local (shared) memory or reading it multiple times.

Thanks in advance!

__attribute__((vec_type_hint(float4))) in NVidia Vectorizing float4 in NVidia?

attribute((vec_type_hint(float4))) in NVidia Vectorizing float4 in NVidia?