Getting started with CUDA code for many NVIDIA GPUs

I’m planning to use CUDA to make a program written in C++ (for medical research) run much faster than before.

I may do an OpenCL version later, but not before I find a low-cost online course in OpenCL for people with no OpenCL experience at all.

I’ve downloaded some of the CUDA documentation and, a few years ago, took an online course in CUDA.

Where can I find information in how to ask the GPU about its capabilities, if the program must be able to run on many computers with a wide variety of which NVIDIA GPUs they have?

Where can I find information on how to make the program use one specific NVIDIA GPU if it is running on a computer that has more than one NVIDIA GPU?

One place would be the CUDA sample code deviceQuery.

One place would be the CUDA sample code simpleMultiGPU (You could also search on the topic CUDA_VISIBLE_DEVICES).

Plus there are numerous questions about these topics on various forums.