Supported Vector Types in CUDA OpenCL vector types is different than CUDA?

I’ve recently developed an application in OpenCL that uses uint4 or uint8 or uint16 as the variable type for storing data. I was interested in studying how the GPU peaks when increasing the amount of data that the kernel processes depending on the data type. On ATI GPUs this went down smoothly, as well as for CUDA despite a persistent error when using the uint16 type. Then I tried the same application using the CUDA Runtime API, i.e. its CUDA implementation counterpart, and found out that uint8 type is not declared in $CUDA_PATH/include/vector_types.h.

If vector types for CUDA are only T, T2, T3 and T4 (template ) then how is the T8 and T16 vector implemented in OpenCL when it reaches the GPU? I’ve tried to define a uint8 variable with no success. I tried

typedef struct
{
unsigned int s0, s1, s2, s3, s4, s5, s6, s7;
}uint8;

But the compiler outputs:

/opt/cuda/include/cuda.h(4464): error: expected a “)”
/opt/cuda/include/cuda.h(4497): error: expected a “)”
/opt/cuda/include/cuda.h(4530): error: expected a “)”
/opt/cuda/include/cuda.h(4681): error: expected a “)”
/opt/cuda/include/cuda.h(4718): error: expected a “)”
/opt/cuda/include/cuda.h(4754): error: expected a “)”

I was under the feeling that one could define a struct and use it directly as a kernel argument. Am I wrong? Again, how are T8 and T16 defined when using OpenCL, as 2xT4 and 4xT4?

uint8 and uint16 probably clash with the names of the 8 and 16 bits wide scalar types - just name your struct differently.

Isn’t that uint8_t and uint16_t?

For the types defined in stdint.h - yes. But the names are just too common, so I’d rather not use them.

What makes me wonder though is that the errors come from cuda.h - how come it is included after your definition of uint8? Is the problem not related to the uint8 declaration at all?

I cannot answer your question as it would be time wasted to address an error that occurs in the cuda.h, in fact those errors are just entropy added to the debugging process. I figured it out. It had to do with the compiling order of the files also. The cuda.h errors were just compilation errors entropy. I forwared declared the kernels as external functions but in the compilation process the kernels either didn’t see the uint8 struct (now defined as my_uint8) or the main function couldn’t see the kernels.

I rearranged the files a bit and it worked.

Thank you for your time (it was code project mismanagement), although I’m still curious about the T8 and T16 types in OpenCL and how they are defined.

[EDIT] Seen it in $CUDA_SDK_PATH/OpenCL/common/inc/cl_platform.h