I am using Jetson Nano for one of my application. I have used pytorch to create neural network. I have run the application with and without moving the network to CUDA. When the network is moved to CUDA, in the profile information, I dont see any kernels executed. Please look at the JetsonNano_cpu_and_gpu.png attached. There are 2 profiles. The first one is when the network is run on CPU and the second one is when the network is run on GPU. I have 2 questions here.
Why I dont see the kernels when the network is executed on CUDA
When the network is executed on CPU, why do I see some operations running on CUDA. To me, in this case GPU should be idle.
The same application I have run on GEFORCE GTX 1650. Here I dont see any CUDA operations while the network is run on CPU. When the network is run on GPU, I could clearly see the kernels executed.
Please let me know what I need to do to have the kernels executed on Jetson Nano and why do I see GPU activity when the network is actually running on CPU/host.
I have installed Pytorch in the same way as you have mentioned. But still, I dont see any kernel functions name displayed in the profile. If you look at the profile for GEFORCE (which I have attached. In a single attachment, the first profile is related to CPU and the second profile is related to GPU), you can see the kernel names clearly when the network is made to run on GPU. But, on the Jetson Nano case, these kernel names are not displayed.
The second issue is, on GEFORCE, when I have run the network only on CPU, I don’t see any profile information for GPU. GPU is idle in this case. But, if you look at the profile for network running on CPU on Jetson Nano(in the first attachment, there are 2 profile informations shared. first one is for CPU and second one is for GPU), you could see contributions of both CPU and GPU. So, the question is why is there GPU contribution when the network is running alone on GPU?
Could you double-confirm if you are using our official package?
In general, the package should contain an nv tag, ex. torch-1.11.0a0+17540c5+nv22.01-cp36-cp36m-linux_aarch64.whl.
But we don’t see this on the package you shared on Jul 21.
If the issue occurs on the official package, could you share the CPU / GPU inference script?
So we can check it further?