ARM Support

Does anyone know what the current state of CUDA is for ARM support? The most I can tell is support ended in CUDA 6.5, but I can’t find any further information about it. I assume it somewhat needs to be supported since the Tegra products use it, and compiling on them would be ARM. A really interesting use case would be this:

http://www.mellanox.com/related-docs/npu-multicore-processors/PB_Bluefield_SoC.pdf

Where the host resides right on the NIC and you wouldn’t need any server CPUs anymore to do RDMA.