Hi AastaLLL,
Changing it to aarch64 works.
Could you please advise me to on the other issue as well.
I’d like to know how to figure out why the kernel takes different execution times on different runs of the application.
https://forums.developer.nvidia.com/t/variable-run-time-for-cuda-kernel/244298