Does anyone know what the current state of CUDA is for ARM support? The most I can tell is support ended in CUDA 6.5, but I can’t find any further information about it. I assume it somewhat needs to be supported since the Tegra products use it, and compiling on them would be ARM. A really interesting use case would be this:
Where the host resides right on the NIC and you wouldn’t need any server CPUs anymore to do RDMA.