I am unable to find any document about memory allocation for Built-in vector type such as Char4.
I want to do following in CUDA.
declare char4 for host using malloc such as char4 *N = (char4 *)malloc(length * sizeof(char4));
declare char4 for device using cudaMalloc such as char4 dN; cudaMalloc((void*) &dN, length * sizeof(char4));
Afterward, I want to use cudaMemcpy to assign host data of char4 to device char4.