I succeeded with a occasional experiment.
The Environment is as follows:
1.Host(x86): 1)CUDA 8.0; 2)Ubuntu 14.04; 3)aarch64-linux-gnu-g++. (CUDA 8.0 repo has cuda-cross-aarch64-8-0.deb)
2.Target(Xavier): 1)CUDA 10.0; 2)Ubuntu 18.04;
And I build an example on Host, then move the executable file to Target.
1.sample : /usr/local/cuda/samples/0_Simple/matrixMul
2.cross compile command: make TARGET_ARCH=aarch64
even not use SMS=72, and CUDA 8.0 do not support sm72.
3.Move executable file to Target.
It can be executable, and we got right results. Test Passed!
I used (readelf) to check the exe file. I found the CUDA kernel code maybe inserted into exe file. And when run it on target Xavier, the GPU driver use JIT compiler to compile the kernel code at run time. So it can work. Is that right ??? This is the kernel code, but how it works with the cudaMalloc part???
Also the OS is not the same version, is that weird? I can’t understand.
But there is another problem. If I use Host CUDA 8 to compile a CUDA program, and the program use some new features of CUDA 10.0, is it still work???