CUDA Code transfer from Quadro K4000m to Tegra K1

Hello everyone,

I have just started with CUDA programming and I have no past experience.
We have CUDA C/C++ codes for 64-bit Quadro K4000M.
My first task is to implement these CUDA C/C++ codes which we run on Quadro K4000M (960 CUDA cores, 64-bit)
to Tegra K1(192 CUDA cores, 32-bit). To match the performance of Tegra K1 with Quadro K4000M,
we are also planning to increase the number of Tegra GPUs on board to 4. Please can you clear
my following doubts…

  1. Do I need to change the code for the new 32-bit GPU while switching from a 64- bit GPU
    CUDA code and if yes, how can I do that?

  2. Do we need to change the code if the number of CUDA cores are changed/increased for a single
    GPU on board?

  3.  How will using 4 GPUs on board will affect the existing code? How to do those changes?
    

Thanks a lot in advance.

  1. Do I need to change the code for the new 32-bit GPU while switching from a 64- bit GPU
    CUDA code and if yes, how can I do that?

more to do with the host, than the device, i would think

  1. Do we need to change the code if the number of CUDA cores are changed/increased for a single
    GPU on board?

depends on whether the code can scale
the applicable kernel dimensions might need to change; if the written code is scalable, it should inherently address this; also, applicable kernels that might run on one divice, might not run on another, due to their inherent resource requirements

  1. How will using 4 GPUs on board will affect the existing code? How to do those changes

the code needs to be cognizant of multiple devices, as opposed to a single device, and needs to be aware that it should (now) manage multiple devices, instead of a single device

i think

Thanks a lot for your help.