Couple of complier intrinsic would solve this problem at the CUDA C level:
template T __load(T *address , LOAD_OPTIONS options);
template void __store(T *address , T value, STORE_OPTIONS options);
Couple of complier intrinsic would solve this problem at the CUDA C level:
template T __load(T *address , LOAD_OPTIONS options);
template void __store(T *address , T value, STORE_OPTIONS options);