How to Distribute Cuda runtime?

I have a C++ application for Windows and Linux that detects if a cuda-capable GPU is available using pytorch torch::cuda::is_available(). If one is available, it uses cuda for a dramatic speedup. If it is not available, it falls back to the CPU. I want to distribute my app with all the necessary cuda libraries so that the user does not need to install cuda. What are the minimal libraries I need to distribute? I linked to libnvrtc and libcuda.

Using ldd on Linux shows that my executable depends on:

Each of these seem to have a lot of dependencies. For example, depends on: => /lib/x86_64-linux-gnu/ => /lib/x86_64-linux-gnu/ => /lib/x86_64-linux-gnu/ => /lib/x86_64-linux-gnu/ => /lib/x86_64-linux-gnu/ => /usr/lib/x86_64-linux-gnu/
/lib64/ depends on: => /lib/x86_64-linux-gnu/ => /lib/x86_64-linux-gnu/ => /lib/x86_64-linux-gnu/ => /lib/x86_64-linux-gnu/ => /lib/x86_64-linux-gnu/

Do I need to distribute all of these shared libraries?

These are all either Linux libs or part of a NVidia driver (that must be compatible with the toolkit version you are using). They will already be installed on the system.
By default nvcc will statically link cudart, so you should be fine just distributing your executable. If you link to other packages, like cuFFT, you will need to distribute it if you don’t do it statically.