Fail to run remote application on Jetson TX2 withNsight Eclipse Edition even after successful build process

Hello everyone,

I would like to develop an application in Nsight Eclipse Edition comes with CUDA 10.0 toolkit. I am using Jetson TX2 and I have setup my system (Jetson + Host PC) by Jetpack4.2 and SDK manager https://developer.nvidia.com/embedded/jetpack.

I have sucessfully flash the OS image and CUDA, TensorRT etc. to Jetson TX2 and CUDA toolkit also installed in my host computer. My host runs on Ubuntu 18.04.

I have followed the instruction in https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#cross-platform. My host has CUDA 10.0 for cross-platform, here is the terminal output:

Reading package lists... Done
Building dependency tree       
Reading state information... Done
cuda-cross-aarch64 is already the newest version (10.0.166-1).
The following package was automatically installed and is no longer required:
  nvidia-cuda-doc
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 202 not upgraded.

I have also set my environment based on this by changing cuda10.1 to cuda10.0 and NsightCompute-2019.1 to NsightCompute-1.0: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#mandatory-post

In Nsight, I modify the project build properties by adding /usr/local/cuda-10.0/targets/aarch64-linux/lib to Build->Settings->NVCC Linker->Libraries->Library search path(-L). After I click to “Apply”, "Libraries (-l) is still empty. Is it normal?

Additionally, I set the CPU Architecture of Jetson TX2 (remote) to AArch64.

I am able to build my project successfully, here is the console output:

11:16:08 **** Build of configuration Debug for project HelloCuda ****
make all -C /home/ktnvidia/CUDA_Projects/HelloCuda/Debug 
make: Entering directory '/home/ktnvidia/CUDA_Projects/HelloCuda/Debug'
Building file: ../helloworld.cu
Invoking: NVCC Compiler
/usr/local/cuda-10.0/bin/nvcc -G -g -O0 -ccbin aarch64-linux-gnu-g++ -gencode arch=compute_35,code=sm_35 -gencode arch=compute_60,code=sm_60 -m64 -odir "." -M -o "helloworld.d" "../helloworld.cu"
/usr/local/cuda-10.0/bin/nvcc -G -g -O0 --compile --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_60,code=sm_60 -m64 -ccbin aarch64-linux-gnu-g++  -x cu -o  "helloworld.o" "../helloworld.cu"
Finished building: ../helloworld.cu
 
Building target: HelloCuda
Invoking: NVCC Linker
/usr/local/cuda-10.0/bin/nvcc --cudart static -L/usr/local/cuda-10.0/targets/aarch64-linux/lib --relocatable-device-code=false -gencode arch=compute_35,code=compute_35 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_60,code=sm_60 -m64 -ccbin aarch64-linux-gnu-g++ -link -o  "HelloCuda"  ./helloworld.o   
Finished building target: HelloCuda
 
make: Leaving directory '/home/ktnvidia/CUDA_Projects/HelloCuda/Debug'
> Shell Completed (exit code = 0)

11:16:12 Build Finished (took 4s.276ms)

However, when I run my code:

#include "stdio.h"

__global__ void helloFromGPU(void)
{
   printf("Hello from GPU");
}

int main()
{
   printf("Hello from CPU");
   helloFromGPU <<<1, 5>>>();
   return 0;
}

I see this result:

o $PWD'>'
/bin/sh -c "cd \"/home/ktnvidia/CUDA_Projects/HelloCuda/Debug\";export LD_LIBRARY_PATH=\"/usr/local/cuda-10.0/lib64\":\${LD_LIBRARY_PATH};export NVPROF_TMPDIR=\"/tmp\";\"/home/ktnvidia/CUDA_Projects/HelloCuda/Debug/HelloCuda\"";exit
ktnvidia@ktnvidia-desktop:~$ echo $PWD'>'
/home/ktnvidia>
ktnvidia@ktnvidia-desktop:~$ /bin/sh -c "cd \"/home/ktnvidia/CUDA_Projects/HelloCCuda/Debug\";export LD_LIBRARY_PATH=\"/usr/local/cuda-10.0/lib64\":\${LD_LIBRARY__PATH};export NVPROF_TMPDIR=\"/tmp\";\"/home/ktnvidia/CUDA_Projects/HelloCuda/Debbug/HelloCuda\"";exit
Hello from CPUlogout

It seems I am not able to run my device code on Jetson TX2. Do you have any idea how can I run that simple code on Jetson TX2?

Thank you very much for your help