__attribute__((vec_type_hint(float4))) in NVidia Vectorizing float4 in NVidia?


I found this attribute in the OpenCL spec:


In general, does the implementation of NVidia the vectorization if this hint is given (with float4)?

I know it’s scalar based, but it could use four physical work-items per one logical thread.

More specifically, how can I know if my specific kernel has been compiled to run on 4 physical work-items or not?

I need this because my work-group size (8 or 16) is smaller than the warp size (32 on NVidia cards) and it involves using read_image_f that returns float4 (most other operations are also float4 based) so if I want to “scalarize” it manually it would require either writing the value that I read from the image to a local (shared) memory or reading it multiple times.

Thanks in advance!